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Abstract 

We propose a rigorous framework for Uncertainty Quantification (UQ) in which 
the UQ objectives and the assumptions/information set are brought to the forefront. 
This framework, which we call Optimal Uncertainty Quantification (OUQ), is based 
on the observation that, given a set of assumptions and information about the 
problem, there exist optimal bounds on uncertainties: these are obtained as values 
of well-defined optimization problems corresponding to extremizing probabilities 
of failure, or of deviations, subject to the constraints imposed by the scenarios 
compatible with the assumptions and information. In particular, this framework 
does not implicitly impose inappropriate assumptions, nor does it repudiate relevant 
information. 

Although OUQ optimization problems are extremely large, we show that un- 
der general conditions they have finite-dimensional reductions. As an application, 
we develop Optimal Concentration Inequalities (OCI) of Hoeffding and McDiarmid 
type. Surprisingly, these results show that uncertainties in input parameters, which 
propagate to output uncertainties in the classical sensitivity analysis paradigm, may 
fail to do so if the transfer functions (or probability distributions) are imperfectly 
known. We show how, for hierarchical structures, this phenomenon may lead to the 
non-propagation of uncertainties or information across scales. 

In addition, a general algorithmic framework is developed for OUQ and is tested 
on the Caltech surrogate model for hypervelocity impact and on the seismic safety 
assessment of truss structures, suggesting the feasibility of the framework for im- 
portant complex systems. 

The introduction of this paper provides both an overview of the paper and a 
self-contained mini-tutorial about basic concepts and issues of UQ. 
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1 Introduction 

1.1 The UQ problem 

This paper sets out a rigorous and unified framework for the statement and solution of 
uncertainty quantification (UQ) problems centered on the notion of available informa- 
tion. In general, UQ refers to any attempt to quantitatively understand the relationships 
among uncertain parameters and processes in physical processes, or in mathematical and 
computational models for them; such understanding may be deterministic or probabilis- 
tic in nature. However, to make the discussion specific, we start the description of the 
proposed framework as it applies to the certification problem; Section 3 gives a broader 
description of the purpose, motivation and applications of UQ in the proposed framework 
and a comparison with current methods. 

By certification we mean the problem of showing that, with probability at least 1 — e, 
the real-valued response function G of a given physical system will not exceed a given 
safety threshold a. That is, we wish to show that 



In practice, the event [G(X) > a] may represent the crash of an aircraft, the failure of a 
weapons system, or the average surface temperature on the Earth being too high. The 
symbol P denotes the probability measure associated with the randomness of (some of) 
the input variables X of G (commonly referred to as "aleatoric uncertainty"). 

Specific examples of values of e used in practice are: 10 -9 in the aviation industry (for 
the maximum probability of a catastrophic event per flight hour, see [83, p. 581] and [15]), 
in the seismic design of nuclear power plants [28, 23] and 0.05 for the collapse of soil 
embankments in surface mining [36, p. 358]. In structural engineering [31], the maximum 
permissible probability of failure (due to any cause) is 10 -4 K s nd/n r (this is an example 
of e) where rid is the design life (in years), n r is the number of people at risk in the event 
of failure and K s is given by the following values (with 1/year units): 0.005 for places of 
public safety (including dams); 0.05 for domestic, office or trade and industry structures; 
0.5 for bridges; and 5 for towers, masts and offshore structures. In US environmental 
legislation, the maximum acceptable increased lifetime chance of developing cancer due 
to lifetime exposure to a substance is 10 -6 [53] ([44] draws attention to the fact that 
"there is no sound scientific, social, economic, or other basis for the selection of the 
threshold 10 -6 as a cleanup goal for hazardous waste sites"). 

One of the most challenging aspects of UQ lies in the fact that in practical appli- 
cations, the measure P and the response function G are not known a priori. This lack 
of information, commonly referred to as "epistemic uncertainty" , can be described pre- 
cisely by introducing A, the set of all admissible scenarios (f, (j) for the unknown - 
or partially known — reality (G, P). More precisely, in those applications, the available 
information does not determine (G, P) uniquely but instead determines a set A such that 
any (/,//) G A could a priori be (G, P). Hence, A is a (possibly infinite-dimensional) set 
of measures and functions defining explicitly information on and assumptions about G 
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Figure 1.1: You are given one pound of play- dough and a seesaw balanced around m. 
How much mass can you put on right hand side of a while keeping the seesaw balanced 
around m? The solution of this optimization problem can be achieved by placing any 
mass on the right hand side of a, exactly at a (to place mass on [a, 1] with minimum 
leverage towards the right hand side of the seesaw) and any mass on the left hand side 
of a, exactly at (for maximum leverage towards the left hand side of the seesaw). 



and P. In practice, this set is obtained from physical laws, experimental data and expert 
judgment. It then follows from (G, P) £ A that 

inf fJ,[f(X) >a}< F[G(X) > a] < sup fJ,[f(X) > a}. (1.2) 

Moreover, it is elementary to observe that 

• The quantities on the right-hand and left-hand of (1.2) are extreme values of 
optimization problems and elements of [0, 1]. 

• Both the right-hand and left-hand inequalities are optimal in the sense that they 
are the sharpest bounds for ¥[G(X) > a] that are consistent with the information 
and assumptions A. 

More importantly, in Proposition 2.1, we show that these two inequalities provide suffi- 
cient information to produce an optimal solution to the certification problem. 

Example 1.1. To give a very simple example of the effect of information and optimal 
bounds over a class A, consider the certification problem (1.1) when Y := G{X) is a 
real- valued random variable taking values in the interval [0, 1] and a G (0, 1); to further 
simplify the exposition, we consider only the upper bound problem, suppress dependence 
upon G and X and focus solely on the question of which probability measures vonM 
are admissible scenarios for the probability distribution of Y. So far, any probability 
measure on [0, 1] is admissible: 

A = {v | v is a probability measure on [0, 1]}. 

and so the optimal upper bound in (1.2) is simply 

P|X > a] < supv[Y > a] = 1. 

Now suppose that we are given an additional piece of information: the expected value of 
Y equals m £ (0, a). These are, in fact, the assumptions corresponding to an elementary 
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Markov inequality, and the corresponding admissible set is 

v is a probability measure on [0, 1] , 1 
E„[Y] = m J ' 

The least upper bound on P[Y > a] corresponding to the admissible set AmAv is the 
solution of the infinite dimensional optimization problem 

sup u[Y > a] (1.3) 

Formulating (1.3) as a mechanical optimization problem (see Figure 1.1), it is easy to 
observe that the extremum of (1.3) can be achieved only considering the situation where 
v is the weighted sum of mass a Dirac at (with weight 1 — p) and a mass of Dirac at 
a (with weight p). It follows that (1.3) can be reduced to the simple (one-dimensional) 
optimization problem: Maximize p subject to ap = m. It follows that Markov's inequality 
is the optimal bound for the admissible set Amt\lv 

F[Y >a}< sup u[Y > a] = f . (1.4) 

In some sense, the OUQ framework that we present in this paper is the the extension of 
this procedure to situations in which the admissible class A is complicated enough that 
a closed-form inequality such as Markov's inequality is unavailable, but optimal bounds 
can nevertheless be computed using reduction properties analogous to the one illustrated 
in Figure 1.1. 



-4-Mrkv = i V 



1.2 Motivating physical example and outline of the paper 

Section 2 gives a formal description of the Optimal Uncertainty Quantification frame- 
work. In order to help intuition, we will illustrate and motivate our abstract definitions 
and results with a practical example: an analytical surrogate model for hypervelocity 
impact. 

The physical system of interest is one in which a 440C steel ball (440C is a standard, 
i.e. a grade of steel) of diameter D p = 1.778 mm impacts a 440C steel plate of thickness h 
(expressed in mm) at speed v (expressed in km • s _1 ) at obliquity 8 from the plate normal. 
The physical experiments are performed at the California Institute of Technology SPHIR 
(Small Particle Hypervelocity Impact Range) facility (see Figure 1.2). An analytical 
surrogate model was developed to approximate the perforation area (in mm 2 ) caused by 
this impact scenario. The surrogate response function is as follows: 

H{h, 6, v) = K (Jj-y ( cos e T ( tanh (^T _1 )) ' ( L5 ) 

where the ballistic limit velocity (the speed below which no perforation area occurs) is 
given by 
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Figure 1.2: Experimenal set up. (a) Stainless steel spherical projectiles and nylon sabots, 
(b) Target plate held at the end of the gun barrel (c) Perforation of the target plate (d) 
General view of the Small Particle Hypervelocity Impact Range (SPHIR) Facility at 
Caltech (e) plate thickness h, plate obliquity 9 and projectile velocity v. 



The seven quantities Hq, s, n, K, p, u and m are fitting parameters that have been chosen 
to minimize the least-squares error between the surrogate and a set of 56 experimental 
data points; they take the values 

H = 0.5794km • s -1 , s = 1.4004, n = 0.4482, K = 10.3936mm 2 , 
p = 0.4757, u = 1.0275, m = 0.4682. 

Hence, in this illustrative example, H(h, 9, v) will be our response function G{X\ , X2, X%) 
and we will consider cases in which H is perfectly and imperfectly known. In Section 
7, we will apply the OUQ framework to the seismic safety assessment of structures and 
consider a more complex example involving a large number of variables. 
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1.2.1 Formulation of the admissible set and reduction theorems. 



For the example considered here, we will assume that the input parameters h, 9 and v 
are random variables, of unknown probability distribution F and of given range 



(h,e,v) g x 
heXi 

v£X 3 



X l xX 2 * X 3 , 

[1.524, 2.667] mm = [60, 105] mils, 
[2.1,2.8] km • s -1 . 



(1.7) 



We will measure lengths in both mm and mils (recall that 1 mm = 39.4 mils). 

We will adopt the "gunner's perspective" that failure consists of not perforating 
the plate, and therefore seek to obtain an optimal bound on the probability of non- 
perforation, i.e. ¥[H < 0], with possibly incomplete information on P and H. 

Assuming H to be known, if the information on P is limited to the knowledge that 
velocity, impact obliquity and plate thickness are independent random variables and that 
the mean perforation area lies in a prescribed range [mi, m.2] := [5.5, 7.5] mm 2 , then this 
information describes the admissible set Ah, where 



(H,fJ.) 



H given by (1.5), 
mi = 5.5mm 2 < E„[U] < m2 



7.5 mm 2 



(1.8) 



If the information on H is limited to values of Oscj(-ff), the component-wise oscil- 
lations (defined below, it is a least upper bound on how a change in variable i affects 
the response function), and if the information on P is as above, then the corresponding 
admissible set is -4mcD> which corresponds to the assumptions of McDiarmid's inequality 
[57], and is defined by 



AmcD ■= I (f,fJ>) 



\i = fii (8) fj, 2 » ^3, 
mi = 5.5 mm 2 < E^[/] < m2 = 7.5 mm 2 , 
0sci(/) < OsCi(H) for i = 1,2,3 



Definition 1.1. Let X := X\ X • • • X X m and consider a function /: X 
i = 1, . . . , m, we define the component- wise oscillations 

Oscj(/) := sup sup \f(...,Xi, ...)-/(..., x'i,... )\. 

(xi,...,Xm)&X x'^Xi 



(1.9) 



For 



(1.10) 



Thus, Oscj(/) measures the maximum oscillation of / in the i factor. 

Remark 1.2. The explicit expression (1.5) of H and the ranges (1.7) allow us to compute 
the component- wise oscillations Osci(H), which are, respectively, 8.86mm 2 , 4.17mm 2 
and 7.20 mm 2 for thickness, obliquity, and velocity. 
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In general, for any admissible set A of function/measure pairs for the perforation 
problem, we define 

U(A):= sup fj,[f(h,6,v) < 0]. (1.11) 

In this notation, the optimal upper bounds on the probability of non-perforation, given 
the information contained in Ah and -4mcD> are 14 (Ah) and U(Amcd) respectively. 

In Ah the response function is exactly known whereas in .4mcD it is imperfectly 
known (the information on the response function is limited to its component- wise oscil- 
lations OsCj(iT)). Both Ah and ^4mcD describe epistemic uncertainties (since in Ah the 
probability distributions of thickness, obliquity, and velocity are imperfectly known). 
AmcD is the set of response functions / and probability measures fj, that could be H 
and P given the information contained in (i.e. the constraints imposed by) Osci(H), the 
independence of the input variables and the bounds mi and 777-2 on the mean perfora- 
tion area. U(Amcd) quantifies the worst case scenario, i.e. the largest probability of 
non-perforation given what H and P could be. 



Reduction theorems. The optimization variables associated with U(Ah) are ten- 
sorizations of probability measures on thickness h, on obliquity and velocity v. This 
problem is not directly computational tractable since finding the optimum appears to re- 
quire a search over the spaces of probability measures on the intervals [1.524, 2.667] mm, 
[0, |£] and [2.1, 2.8] km • s _1 . However, in Section 4 (Theorem 4.1 and Corollary 4.4) we 
show that, since the constraint m\ < E^[i2] < 7772 is multi-linear in fi±,fJ>2 and us, the 
optimum U(Ah) can be achieved by searching among those measures n whose marginal 
distributions on each of the three input parameter ranges have support on at most two 
points. That is, 

U(A H )=U(A A ), (1.12) 
where the reduced feasible set Aa is given by 



A A 



H given by (1.5), 

M = Ml ® ^2 <8> M3> 
Hi £ Ai(Xi) for % = 1,2,3, 
mi < E^H] < m 2 



(1.13) 



where 



Ai(A'i) := {aS x o + (1 - a)5 x i \ x j G Xi, for j = 0,1 and a G [0, 1]} 



denotes the set of binary convex combinations of Dirac masses on X{. 

More generally, although the OUQ optimization problems (1.2) are extremely large, 
we show in Section 4 that an important subclass enjoys significant and practical finite- 
dimensional reduction properties. More precisely, although the optimization variables 
(/, fj,) live in a product space of functions and probability measures, for OUQ problems 
governed by linear inequality constraints on generalized moments, we demonstrate in 
Theorem 4.1 and Corollary 4.4 that the search can be reduced to one over probability 
measures that are products of finite convex combinations of Dirac masses with explicit 
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upper bounds on the number of Dirac masses. Moreover, all the results in this paper 
can be extended to sets of extreme points (extremal measures) more general than Dirac 
masses, such as those described by Dynkin [24]; we have phrased the results in terms of 
Dirac masses for simplicity. 

Furthermore, when all constraints are generalized moments of functions of /, the 
search over admissible functions reduces to a search over functions on an m-fold product 
of finite discrete spaces, and the search over m-fold products of finite convex combinations 
of Dirac masses reduce to the products of probability measures on this m-fold product 
of finite discrete spaces. This latter reduction, presented in Theorem 4.7, completely 
eliminates dependency on the coordinate positions of the Dirac masses. With this result, 
the optimization variables of U(Amcd) can be reduced to functions and products of 
probability measures on {0, l} 3 . 



1.2.2 Optimal concentration inequalities. 

Concentration-of-measure inequalities can be used to obtain upper bounds on U(Ah) 
and U{Amcd)', in that sense, they lead to sub-optimal methods. Indeed, according to 
McDiarmid's inequality [57, 58], for all functions / of m independent variables, one must 
have 2 

/*[/(*!, . . . ,X m ) - E,[f] >a}< exp (~ 2 ^m ( q sc . (/))2 ) • (1-14) 
Application of this inequality to (1.9) (using E^[/] > mi = 5.5 mm 2 ) yields the bound 

(2m 2 \ 
', m2 = 66 - 4% - ( L15 ) 
£i=i Osci(Hyj 

Note that U(Amcd) '■= su P(/,At)e^ Mc D ^ — ^] * s * ne least upper bound on the probability 
of non-perforation ¥[H = 0] given the information contained in the admissible set (1.9). 

In Section 5, the reduction techniques of Section 4 are applied to obtain optimal 
McDiarmid and Hoeffding inequalities, i.e. optimal concentration-of-measure inequali- 
ties with the assumptions of McDiarmid's inequality [57] or Hoeffding's inequality [35]. 
In particular, Theorems 5.1, 5.2 and 5.4 provide analytic solutions to the McDiarmid 
problem for dimension m = 1,2,3, and Proposition 5.7 provides a recursive formula 
for general m, thereby providing an optimal McDiarmid inequality in these cases. In 
Theorems 5.11 and 5.13, we give analytic solutions under Hoeffding's assumptions. A 
noteworthy result is that the optimal bounds associated with McDiarmid's and Hoeffd- 
ing's assumptions are the same for m = 2 but may be distinct for m = 3, and so, in 
some sense, information about linearity or non-linearity of the response function has a 
different effect depending upon the dimension m of the problem. 



Non-propagation of uncertainties. For m = 2, define A% to be the space of all 
functions / and measure fi such that [i = fi\ (g) [x<i and OsCj(/) < Dj. The optimal 
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concentration-of-measure inequality with the assumptions of McDiarmid's inequality, 
Theorem 5.2, states that 




if B x + D 2 > a 



(/,M)G^2 



sup n[f(X 1 ,X 2 )-E^[f] >a] = < 



(£>! + D 2 - a) 2 
±D X D 2 



if \D X -D 2 \ < a < D\ + D 2 , 



max(L>i, D 2 ) 



if < a < |L>i - D 2 \. 



(1.16) 



Observe that if D 2 + a < D\ , then the optimal bound does not depend on D 2 , and there- 
fore, any decrease in D 2 does not improve the inequality. These explicit bounds show 
that, although uncertainties may propagate for the true values of G and P (as expected 
from the sensitivity analysis paradigm), they may fail to do so when the information 
is incomplete on G and P and the objective is the maximum of fi[f > a] compatible 
with the given information. The non-propagation of input uncertainties is a non-trivial 
observation related to the fact that some of the constraints defining the range of the 
input variables may not be realized by the worst-case scenario (extremum of the OUQ 
problem). We have further illustrated this point in Section 8 and shown that for systems 
characterized by multiple scales or hierarchical structures, information or uncertainties 
may not propagate across scales. Note that the m = 2 case does not apply to the SPHIR 
example (since (1.9) involves three variables, i.e. it requires m = 3). 

Application to the SPHIR facility admissible set (1.9). For m = 3, the "optimal 
McDiarmid inequality" of Theorem 5.4 and Remark 4.2 provides the upper bound 



Remark 5.6 also shows that reducing the uncertainty in obliquity (described by the 
constraint Oscj(/) < Oscj(if) in (1.9) for i = obliquity) does not affect the least upper 
bound (1.17). Recall that U(Amcd) is the least upper bound on the probability that 
the perforation is zero given that the mean perforation area is in between 5.5 mm 2 and 
7.5 mm 2 and the constraints imposed by the independence, ranges and component- wise 
oscillations associated with the input random variables. 

The difference between (1.15) and (1.17) lies in the fact that 66.4% is non-optimal 
whereas 43.7% is the least upper bound on the probability of non perforation given the 
information contained in AmcD- 43.7% is a direct function of the information contained 
in .AmcD and Section 2 shows how admissible sets with higher information content lead 
to smaller least upper bounds on the probability of non perforation. Section 2 also shows 
how such admissible sets can be constructed, in the OUQ framework, via the optimal 
selection of experiments. 

1.2.3 Computational framework. 

With access to H, not just its componentwise oscillations, even sharper bounds on the 
probability of non-perforation can be calculated. Although we do not have an analytical 




(1.17) 
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Figure 1.3: For #supp( / Uj) < 2, i = 1,2,3, the maximizers of the OUQ problem (1.12) 
associated with the information set (1.8) collapse to two-point (as opposed to eight- 
point) support. Velocity and obliquity marginals each collapse to a single Dirac mass, 
while the plate thickness marginal collapses to have support on the extremes of its range. 
Note the perhaps surprising result that the probability of non-perforation is maximized 
by a distribution supported on the minimal, not maximal, impact obliquity. 
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Figure 1.4: Time evolution of the genetic algorithm search for the OUQ problem (1.12) 
associated with the information set (1.8) ((1.13) after reduction) for #supp(/Xj) < 2 
for i = 1,2,3, as optimized by mystic. Thickness quickly converges to the extremes 
of its range, with a weight of 0.621 at 60 mils and a weight of 0.379 at 105 mils. The 
degeneracy in obliquity at causes the fluctuations seen in the convergence of obliquity 
weight. Similarly, velocity converges to a single support point at 2.289 km ■ s , the 
ballistic limit velocity for thickness 105 mils and obliquity (see (1.6)). 
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formula for U(Ah), its calculation is made possible by the identity (1.12) derived from 
the reduction results of Section 4. A numerical optimization over the finite-dimensional 
reduced feasible set Aa using a Differential Evolution [ !] optimization algorithm im- 
plemented in the mystic framework [60] (see Subsection 6.3) yields the following optimal 
upper bound on the probability of non-perforation: 

F[H = 0] < U(A H ) = U(A A ) n = m 37.9%. 

Observe that U ¥[H = 0] < U{A) = U(Aa)" is a theorem, whereas U U{A A ) n = m 37.9%" is 
the output of an algorithm (in this genetic algorithm implemented in the mystic 

framework described in Subsection 6.3). In particular, its validity is correlated with the 
efficiency of the specific algorithm. We have introduced the symbol to emphasize 
the distinction between mathematical (in) equalities and numerical outputs. 

Although we do not have a theorem associated with the convergence of the numerical 
optimization algorithm, we have a robust control over its efficiency because it is applied 
to the finite dimensional problem U(Aa) instead of the infinite optimization problem 
associated with U(Ah) (thanks to the reduction theorems obtained in Section 4). 

We also observe that the maximizer U{Ah) can be of significantly smaller dimension 
than that of the elements of U{Aa)- Indeed, for #supp(/ij) < 2, i = 1,2,3 (where 
#supp(^j) is the number of points forming the support of Hi), Figure 1.3 shows that 
numerical simulations collapse to two-point support: the velocity and obliquity marginals 
each collapse to a single Dirac mass, and the plate thickness marginal collapses to have 
support on the two extremes of its range. See Figure 1.4 for plots of the locations and 
weights of the Dirac masses forming each marginal Hi as functions of the number of 
iterations. Note that the lines for thickness and thickness weight are of the same color 
if they correspond to the same support point for the measure. 

In Section 6 we observe that, even when a wider search is performed (i.e. over mea- 
sures with more than two-point support per marginal), the calculated maximizers for 
these problems maintain two-point support. This observation suggests that the extreme 
points of the reduced OUQ problems are, in some sense, attractors and that adequate 
numerical implementation of OUQ problems can detect and use "hidden" reduction 
properties even in the absence of theorems proving them to be true. Based on these 
observations, we propose, in Section 6, an OUQ optimization algorithm for arbitrary 
constraints based on a coagulation/fragmentation of probability distributions. 

The simulations of Figures 1.3 and 1.4 show that extremizers are singular and that 
their support points identify key players, i.e. weak points of the system. In particular, for 
U(Ah), the location of the two-point support extremizer shows that reducing maximum 
obliquity and the range of velocity will not decrease the optimal bound on the probability 
of non perforation, and suggests that reducing the uncertainty in thickness will decrease 
this bound. In Section 2, we will show that the OUQ framework allows the development 
of an OUQ loop that can be used for experimental design. In particular, we will show 
that the problem of predicting optimal bounds on the results of experiments under the 
assumption that the system is safe (or unsafe) is well-posed and benefits from similar 
reduction properties as the certification problem. Best experiments are then naturally 
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identified as those whose predicted ranges have minimal overlap between safe and unsafe 
systems. 

1.2.4 Outline of the paper. 

Section 2 formally describes the Optimal Uncertainty Quantification framework and 
what it means to give optimal bounds on the probability of failure in (1.1) given (lim- 
ited) information/assumptions about the system of interest, and hence how to rigorously 
certify or de-certify that system. Section 3 shows that many important UQ problems, 
such as prediction, verification and validation, can be formulated as certification prob- 
lems. Section 3 also gives a comparison of OUQ with other widely used UQ methods. 
Section 4 shows that although OUQ optimization problems (1.2) are (a priori) infinite- 
dimensional, they can (in general) be reduced to equivalent finite-dimensional problems 
in which the optimization is over the extremal scenarios of A and that the dimension 
of the reduced problem is a function of the number of probabilistic inequalities that 
describe A. Just as a linear program finds its extreme value at the extremal points of a 
convex domain in IR n , OUQ problems reduce to searches over finite-dimensional families 
of extremal scenarios. Section 5 applies the results of Section 4 to obtain optimal con- 
centration inequalities under the assumptions of McDiarmid's inequality and Hoeffding's 
inequality. Those inequalities show that, although uncertainties may propagate for the 
true values of G and P, they might not when the information is incomplete on G and P. 
Section 6 discusses the numerical implementation of OUQ algorithms for the analytical 
surrogate model (1.5) for hypervelocity impact. Section 7 assesses the feasibility of the 
OUQ formalism by means of an application to the safety assessment of truss structures 
subjected to ground motion excitation. This application contains many of the features 
that both motivate and challenge UQ, including imperfect knowledge of random inputs 
of high dimensionality, a time-dependent and complex response of the system, and the 
need to make high-consequence decisions pertaining to the safety of the system. Section 
8 applies the OUQ framework and reduction theorems of sections 4 and 5 to divergence 
form elliptic PDEs. A striking observation of Section 8 is that with incomplete informa- 
tion on the probability distribution of the microstructure, uncertainties or information 
do not necessarily propagate across scales. Section 9 emphasizes that the "UQ prob- 
lem" (as it is treated in common practice today) appears to have all the symptoms of an 
ill-posed problem; and that, at the very least, it lacks a coherent general presentation, 
much like the state of probability theory before its rigorous formulation by Kolmogorov 
in the 1930s. It also stresses that OUQ is not the definitive answer to the UQ problem, 
but an opening gambit. Section 10 gives the proofs of our main results. 

2 Optimal Uncertainty Quantification 

In this section, we describe more formally the Optimal Uncertainty Quantification frame- 
work. In particular, we describe what it means to give optimal bounds on the probability 
of failure in (1.1) given information/assumptions about the system of interest, and hence 
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how to rigorously certify or de-certify that system. 

For the sake of clarity, we will start the description of OUQ with deterministic 
information and assumptions (when A is a deterministic set of functions and probability 
measures). 

2.1 First description 

In the OUQ paradigm, information and assumptions lie at the core of UQ: the available 
information and assumptions describe sets of admissible scenarios over which optimiza- 
tions will be performed. As noted by Hoeffding [3 '], assumptions about the system of 
interest play a central and sensitive role in any statistical decision problem, even though 
the assumptions are often only approximations of reality. 

A simple example of an information/assumptions set is given by constraining the 
mean and range of the response function. For example, let Ai(X) be the set of probability 
measures on the set X , and let A\ be the set of pairs of probability measures fi G M.(X) 
and real-valued measurable functions / on X such that the mean value of / with respect 
to li is b and the diameter of the range of / is at most D; 



Let us assume that all that we know about the "reality" (G, P) is that (G, P) G At. 
Then any other pair (/, li) G A\ constitutes an admissible scenario representing a valid 
possibility for the "reality" (G, P). If asked to bound P[G(X) > a], should we apply 
different methods and obtain different bounds on ¥[G(X) > a]? Since some methods 
will distort this information set and others are only using part of it, we instead view set 
Ai as a feasible set for an optimization problem. 

The general OUQ framework. In the general case, we regard the response function 
G as an unknown measurable function, with some possibly known characteristics, from 
one measurable space X of inputs to a second measurable space y of values. The input 
variables are generated randomly according to an unknown random variable X with 
values in X according to a law P G M{X), also with some possibly known characteristics. 
We let a measurable subset y$ Q y define the failure region; in the example given above, 
3^ = R and 3^o = [ a i +oo). When there is no danger of confusion, we shall simply write 
[G fails] for the event [G(X) G y ]. 

Let e G [0, 1] denote the greatest acceptable probability of failure. We say that the 
system is safe if ¥[G fails] < e and the system is unsafe if W[G fails] > e. By information, 
or a set of assumptions, we mean a subset 




(2.1) 
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that contains, at the least, (G, P). The set A encodes all the information that we have 
about the real system (G, P), information that may come from known physical laws, past 
experimental data, and expert opinion. In the example A\ above, the only information 
that we have is that the mean response of the system is b and that the diameter of 
its range is at most D; any pair (/, (i) that satisfies these two criteria is an admissible 
scenario for the unknown reality (G, P). Since some admissible scenarios may be safe 
(i.e. have fi[f fails] < e) whereas other admissible scenarios may be unsafe (i.e. have 
jj,[f fails] > e), we decompose A into the disjoint union A = ^4 S afe,e W A unsa ,f e<f , where 

Aafe,e := {(/, fj)€A\ n\J fails] < e}, (2.3a) 
Amsafc.e := {(/, n) € A \ fi[f fails] > e}. (2.3b) 

Now observe that, given such an information/assumptions set A, there exist up- 
per and lower bounds on W[G(X) > a] corresponding to the scenarios compatible with 
assumptions, i.e. the values C(A) and U(A) of the optimization problems: 

£(A) := inf n\f fails] (2.4a) 

U{A):= sup (i[f fails]. (2.4b) 

Since C(A) and U(A) are well-defined in [0,1], and approximations are sufficient for 
most purposes and are necessary in general, the difference between sup and max should 
not be much of an issue. Of course, some of the work that follows is concerned with 
the attainment of maximizers, and whether those maximizers have any simple structure 
that can be exploited for the sake of computational efficiency, and this is the topic of 
Section 4. For the moment, however, simply assume that £{A) and IA{A) can indeed be 
computed on demand. Now, since (G, P) G A, it follows that 

C(A) <F[G fails] <U{A). 

Moreover, the upper bound U(A) is optimal in the sense that 

H[f fails] < U{A) for all (/,//) G A 

and, if U' < U(A), then there is an admissible scenario (/, /u) G A such that 

U' < fi[f fails] <U{A). 

That is, although ¥[G fails] may be much smaller than U(A), there is a pair (/, /i) which 
satisfies the same assumptions as (G, P) such that /j,[f fails] is approximately equal to 
U{A). Similar remarks apply for the lower bound C{A). 

Moreover, the values C(A) and U(A), defined in (2.4) can be used to construct a 
solution to the certification problem. Let the certification problem be defined by an error 
function that gives an error whenever 1) the certification process produces "safe" and 
there exists an admissible scenario that is unsafe, 2) the certification process produces 
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C(A) := inf a[f{X)>a] 


U{A) := sup y,[f(X) > a] 


< e 


Cannot decide 

Insufficient Information 


Certify 
Safe even in the Worst Case 


> e 


De-certify 
Unsafe even in the Best Case 


Cannot decide 

Insufficient Information 



Table 2.1: The OUQ certification process provides a rigorous certification criterion whose 
outcomes are of three types: "Certify" , "De-certify" and "Cannot decide" . 

"unsafe" and there exists an admissible scenario that is safe, or 3) the certification 
process produces "cannot decide" and all admissible scenarios are safe or all admissible 
points are unsafe; otherwise, the certification process produces no error. The following 
proposition demonstrates that, except in the special case C{A) = e, that these values 
determine an optimal solution to this certification problem. 

Proposition 2.1. If (G,F) G A and 

• U{A) < e then P[G fails] < e. 

• e < C{A) then F[G fails] > e. 

• C(A) < e < U(A) the there exists (fi,[ii) £ A and (/b,/^) £ A such that 

f ails \ < e < M2[/2 fails]. 

In other words, provided that the information set A is valid (in the sense that (G, P) 6 
A) then if U(A) < e, then, the system is provably safe; if e < C(A), then the system 
is provably unsafe; and if C{A) < e < U(A), then the safety of the system cannot 
be decided due to lack of information. The corresponding certification process and its 
optimality are represented in Table 2.1. Hence, solving the optimization problems (2.4) 
determines an optimal solution to the certification problem, under the condition that 
C{A) 7^ e. When C(A) = e we can still produce an optimal solution if we obtain further 
information. That is, when C{A) = e = U(A), then the optimal process produces "safe". 
On the other hand, when C(A) = e < U(A), the optimal solution depends on whether 
or not there exists a minimizer (/, fj,) E A such that fj,[f fails] = C(A); if so, the optimal 
process should declare "cannot decide", otherwise, the optimal process should declare 
"unsafe". Observe that, in Table 2.1, we have classified C(A) = e < U{A) as "cannot 
decide". This "nearly optimal" solution appears natural and conservative without the 
knowledge of the existence or non-existence of optimizers. 

Example 2.1. The bounds C(A) and U(A) can be computed exactly — and are non- 
trivial — in the case of the simple example *4i given in (2.1). Indeed, writing x+ := 
max(x,0), the optimal upper bound is given by 

U(At)= Pmax :=(l-^^j , (2.5) 
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where the maximum is achieved by taking the measure of probability of the random 
variable f(X) to be the weighted sum of two weighted Dirac delta masses 1 



This simple example demonstrates an extremely important point: even if the function 
G is extremely expensive to evaluate, certification can be accomplished without recourse 
to the expensive evaluations of G. Furthermore, the simple structure of the maximizers 
motivates the reduction theorems later in Section 4. 

Example 2.2. As shown in Equation (1.14), concentration-of-measure inequalities lead 
to sub-optimal methods in the sense that they can be used to obtain upper bounds 
on U(A) and lower bounds on C{A). Observe that McDiarmid's inequality (1.14) re- 
quired an information/assumptions set -4mcD where the space X is a product space 
with X = (Xi,X2, . . . ,X m ), the mean performance satisfies E[G(X)] < b, the m inputs 
Xi, . . . ,X m are independent, and the component-wise oscillations of G, (see (1.10)) are 
bounded Oscj(G) < Dj. It follows from McDiarmid's inequality (1.14) that, under the 
assumptions -4mcD> 



This example shows how existing techniques such as concentration-of-measure inequali- 
ties can be incorporated into OUQ. In Section 4, we will show how to reduce U(Amcd) to 
a finite dimensional optimization problem. Based on this reduction, in Section 5, we pro- 
vide analytic solutions to the optimization problem IA{Amcd) for m = 1, 2, 3. In practice, 
the computation of the bounds Di require the resolution of an optimization problem, 
we refer to [52, 46, 1] for practical methods. We refer to [52, 46, 1, 78] for illustrations 
of UQ through concentration of measure inequalities. In particular, since Oscj(G) is a 
semi-norm, a (possibly numerical) model can be used to compute bounds on component- 
wise oscillations of G via the triangular inequality Oscj(G) < Oscj(F) + Oscj(G — F) (we 
refer to [52, 46, 1] for details, the idea here is that an accurate model leads to a reduced 
number of experiments for the computation of OsCj(G — F), while the computation of 
Osci(F) does not involve experiments). In the sequel we will refer to Di g '■= Oscj(G) 
(for i = 1, . . . , m) as the sub- diameters of G and to 



as the diameter of G. Bounds on Oscj(G) are useful because they constitute a form of 
non-linear sensitivity analysis and, combined with independence constraints, they lead 

5 Z is the Dirac delta mass on z, i.e. the measure of probability on Borel subsets A C R such that 
8z(A) = 1 if z 6 A and S Z (A) — otherwise. The first Dirac delta mass is located at the minimum of 
the interval [a, oo] (since we are interested in maximizing the probability of the event fj,[f(X) > a]). The 
second Dirac delta mass is located at z — a — D because we seek to maximize p max under the constraints 
PmaxH + (1 - Pmax)^ < b and a - z < D. 



+ (1 -Pmax)<5a-D- 





(2.6) 
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A/fpflinrl 

IV J. t; LlltJU. 


A t\ /r t~i " mnPHPTlflPTIPP 1 1 Q "i~ 1 nn o TT H lTIPtin 

constraints as given by (1.9) 


< 66 4% 

= 43.7% 


A/Tr'T^i n T*nn i r\ m onn ci 1 1 i~\r 

Theorem 5.4 


Ah as given by (1.8) 


n ^ m 37.9% 


mystic, H known 


f/ TT \ w-median velocity] 
A *"{( H >ti =2.45km.s- ) 


n ^ m 30.0% 


mystic, H known 


Ah H { (iJ, //) //-median obliquity = ^ } 


n ^ m 36.5% 


mystic, H known 


_4ff H {(H,fi) obliquity = ? /i-a.s.} 


n ^ m 28.0% 


mystic, H known 



Table 2.2: Summary of the upper bounds on the probability of non-perforation for Ex- 
ample (1.8) obtained by various methods and assumptions. Note that OUQ calculations 
using mystic (described in Section 6) involve evaluations of the function H, whereas 
McDiarmid's inequality and the optimal bound given the assumptions of McDiarmid's 
inequality use only the mean of H and its McDiarmid subdiameters, not H itself. Note 
also that the incorporation of additional information/assumptions, e.g. on impact obliq- 
uity, always reduces the OUQ upper bound on the probability of failure, since this cor- 
responds to a restriction to a subset of the original feasible set Ah for the optimization 
problem. 



to the concentration of measure phenomenon. The OUQ methodology can also handle 
constraints of the type \\G — F\\ < C (which are not sufficient to observe take advantage 
of the concentration of measure effect) and G(xi) = Zi [90]. 

Example 2.3. For the set Ah given in Equation (1.8), the inclusion of additional in- 
formation further reduces the upper bound U{Ah)- Indeed, the addition of assumptions 
lead to a smaller admissible set Ah •->• A 1 C Ah, therefore U decreases and C increases. 
For example, if the median of the third input parameter (velocity) is known to lie at the 
midpoint of its range, and this information is used to provide an additional constraint, 
then the least upper bound on the probability of non-perforation drops to 30.0%. See 
Table 2.2 for a summary of the bounds presented in the hypervelocity impact example 
introduced in Subsection 1.2, and for further examples of the effect of additional infor- 
mation/constraints. The bounds given in Table 2.2 have been obtained by using the 
reduction theorems of Section 4 and the computational framework described in Section 
6. 

Remark 2.2. The number of iterations and evaluations of H associated with Table 
2.2 are: 600 iterations and 15300 ^-evaluations (second row), 822 iterations and 22700 
-ff-evaluations (third row), 515 iterations and 14550 //-evaluations (fourth row), 760 
iterations and 18000 ^/-evaluations (fifth row). Half of these numbers of iterations are 
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usually sufficient to obtain the extrema with 4 digits of accuracy (for the third row, 
for instance, 365 iterations and 9350 .ff-evaluations are sufficient to obtain the first 4 
decimal points of the optimum). 

On the selectiveness of the information set A. Observe that, except for the 
bound obtained from McDiarmid's inequality, the bounds obtained in Table 2.2 are 
the best possible given the information contained in A. If the unknown distribution P 
is completely specified, say by restricting to the feasible set A nv ii for which the only 
admissible measure is the uniform probability measure on the cube X (in which case 
the mean perforation area is K[H] = 6.58 mm 2 ), then the probability of non-perforation 
is U(A un i{) = £(^l un if) = 3.8%. One may argue that there is a large gap between 
the fifth (28%) row of Table 2.2 and 3.8% but observe that 3.8% relies on the exact 
knowledge of G (called H here) and P whereas 28% relies on the limited knowledge 
contained in Ah H {(H, fj) | obliquity = ^ /i-a.s.} with respect to which 28% is optimal. 
In particular, the gap between those two values is not caused by a lack of tightness 
of the method, but by a lack of selectiveness of the information contained in Ah H 
{(H,fi) | obliquity = ? /i-a.s.}. The (mis)use of the terms "tight" and "sharp" without 
reference to available information (and in presence of asymmetric information) can be 
the source of much confusion, something that we hope is cleared up by the present work. 
Given prior knowledge of G and P, it would be an easy task to construct a set Ap,G 
containing (G, P) such that U [Aw,g) ~ 4% (if the probability of failure under (G, P) is 
3.8%), but doing so would be delaying a honest discussion on one of the issues at the core 
of UQ: How to construct A without prior knowledge of G and P? In other words, how 
to improve the "selectiveness" of A or how to design experiments leading to "narrow" 
.4s? We will now show how this question can be addressed within the OUQ framework. 

2.2 The Optimal UQ loop 

In the previous subsection we discussed how the basic inequality 

C(A) < P[G > a] < U(A) 

provides rigorous optimal certification criteria. The certification process should not 
be confused with its three possible outcomes (see Table 2.1) which we call "certify" 
(we assert that the system is safe), "de-certify" (we assert that the system is unsafe) 
and "cannot decide" (the safety or un-safety of the system is undecidable given the 
information/assumption set .A). Indeed, in the case 

C(A) < e < U(A) 

there exist admissible scenarios under which the system is safe, and other admissible 
scenarios under which it is unsafe. Consequently, it follows that we can make no definite 
certification statement for (G,P) without introducing further information/ assumptions. 
If no further information can be obtained, we conclude that we "cannot decide" (this 
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Selection of New Experiments 



Experimental Data 
(Legacy / On-Demand) 



Expert Judgement 



Physical 




Laws 


> 



Assumptions / Admissible Set, A 



Extreme Scale Optimizer: Calculate 
C(A) := inf{/x[/ fails] | (f,n)€A] 
U{A) := sup{/x[/ fails] | (/,//) G A} 



Certification 
Process 



Sensitivity / Robustness 
Analysis w.r.t. A 



De-Certify 
(i.e. System is 
Unsafe) 



Cannot 
Decide 




Figure 2.1: Optimal Uncertainty Quantification Loop. 



state could also be called "do not decide" , because we could (arbitrarily) decide that the 
system is unsafe due to lack of information, for instance, but do not). 

However, if sufficient resources exist to gather additional information, then we enter 
what may be called the optimal uncertainty quantification loop, illustrated in Figure 2.1. 
The admissible set A draws on three principal sources of information: known physical 
laws, expert opinion, and experimental data. Once the set A has been constructed, 
the calculation of the lower and upper bounds C{A) and IA{A) are well-posed optimiza- 
tion problems. If the results of these optimization problems lead to certification or 
de-certification, then we are done; if not, then new experiments should be designed and 
expert opinion sought in order to validate or invalidate the extremal scenarios that cause 
the inequality 

C{A) < e < U{A) 

to hold. The addition of information means further constraints on the collection of 
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admissible scenarios; that is, the original admissible set A is reduced to a smaller one 
A' C A, thereby providing sharper bounds on the probability of failure: 

£{A) < C(A') < F[G(X) >a}< U(A') < U(A). 

The sharper bounds may meet the "certify" or "decertify" criteria of Table 2.1. If not, 
and there are still resources for gathering additional information, then the loop should 
be repeated. This process is the feedback arrow on the left-hand side of Figure 2.1. 

The right-hand side of Figure 2.1 constitutes another aspect of the OUQ loop. The 
bounds C{A) and U(A) are only useful insofar as the assumptions A are accurate. It 
is possible that the sources of information that informed A may have been in error: 
physical laws may have been extended outside their range of validity (e.g. Newtonian 
physics may have been applied in the relativistic regime), there may have been difficulties 
with the experiments or the results misinterpreted, or expert opinion may have been 
erroneous. Therefore, a vital part of OUQ is to examine the sensitivity and robustness 
of the bounds C(A) and U {A) with respect to the assumption set A. If the bounds £(A) 
and U(A) are found to depend sensitively on one particular assumption (say, the mean 
performance of one component of the system), then it would be advisable to expend 
resources investigating this assumption. 

The loop illustrated in Figure 2.1 differs from the loop used to solve the numerical 
optimization problem as described in Sub-section 6.3 and Remark 6.3. 



Experimental design and selection of the most decisive experiment. An im- 
portant aspect of the OUQ loop is the selection of new experiments. Suppose that a 
number of possible experiments Ei are proposed, each of which will determine some func- 
tional $i(G,P) of G and P. For example, $i(G,P) could be E P [G], $ 2 (G,P) could be 
P[X 6 A] for some subset A C X of the input parameter space, and so on. Suppose that 
there are insufficient experimental resources to run all of these proposed experiments. 
Let us now consider which experiment should be run for the certification problem. Recall 
that the admissible set A is partitioned into safe and unsafe subsets as in (2.3). Define 
<^safe,e( ( l > i) to be the closed interval spanned by the possible values for the functional 
<3?i over the safe admissible scenarios (i.e. the closed convex hull of the range of $j on 
Aafe.e): that is, let 



safe,. 



J, 



unsafe, e 



inf $i (/,//), sup $i(/,A*) 

(/,M)eAafa, e (/,M)eAafe, £ 



mf sup 



(2.7a) 
(2.7b) 



Note that, in general, these two intervals may be disjoint or may have non-empty inter- 
section; the size of their intersection provides a measure of usefulness of the proposed 
experiment E^. Observe that if experiment Ei were run, yielding the value <&j(G,P), 
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then the following conclusions could be drawn: 

$i(G,P) G J S afc,e($i) n </ U nsafc,e($i) =>" no Conclusion, 

$i(G,P) G J S afe,e($i) \ ^un Sa fe,e(^i) =>• the system is safe, 
$i(G,P) G J U nsafe,e(*i) \ ^safo,e(^j) => the system is unsafe, 
$i(G,P) ^ J S afe,e(^i) U J U nsafc,e =>• faulty assumptions, 

where the last assertion (faulty assumptions) means that (G, P) ^ .4 and follows from 
the fact that P) ^ </ S afe,e(^i) U J U nsafe,e(^i) is a contradiction. The validity of the 

first three assertions is based on the supposition that (G, P) G A. 

In this way, the computational optimization exercise of finding J Ba £e,e(^i) and J U nsafe,e(^i 
for each proposed experiment E% provides an objective assessment of which experiments 
are worth performing: those for which J sa ,fe,e($i) and J U nsafe,e(^ ) i) are nearly disjoint 
intervals are worth performing since they are likely to yield conclusive results vis-a- 
vis (de-) certification and conversely, if the intervals J sa fe,e( < ^i) and J U nsafe,e( < ^i) have a 
large overlap, then experiment Ei is not worth performing since it is unlikely to yield 
conclusive results. Furthermore, the fourth possibility above shows how experiments 
can rigorously establish that one's assumptions A are incorrect. See Figure 2.2 for an 
illustration. 

Remark 2.3. For the sake of clarity, we have started this description by defining ex- 
periments as Junctionals §i of P and G. In practice, some experiments may not be 
functionals of P and G but of related objects. Consider, for instance, the situation 
where (X\ } X2) is a two-dimensional Gaussian vector with zero mean and covariance 
matrix C, P is the probability distribution of Xi, the experiment E2 determines the 
variance of X2 and the information set A is C G B, where B is a subset of symmetric 
positive definite 2x2 matrices. The outcome of the experiment E2 is not a function of 
the probability distribution P; however, the knowledge of P restricts the range of possi- 
ble outcomes of E<i- Hence, for some experiments Ei, the knowledge of (G, P) does not 
determine the outcome of the experiment, but only the set of possible outcomes. For 
those experiments, the description given above can be generalized to situations where 
<]?j is a multivalued functional of (G, P) determining the set of possible outcomes of the 
experiment E{. This picture can be generalized further by introducing measurement 
noise, in which case (Gr, P) may not determine a deterministic set of possible outcomes, 
but instead a measure of probability on a set of possible outcomes. 

Example 2.4 (Computational solution of the experimental selection problem). We will 
now consider again the admissible set Ah as given by (1.8). The following example 
shows that the notion of "best" experiment depends on the admissible safety threshold e 
for P[G > a]. Suppose that an experiment E is proposed that will determine the proba- 
bility mass of the upper half of the velocity range, [2.45, 2.8] km • s _1 ; the corresponding 
functional $ of study is 

:=fi[v > 2.45km -s" 1 ], 

and the proposed experiment E will determine <&(P) (approximate determinations includ- 
ing measurement and sampling errors can also be handled, but the exact determination 
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<Ai nsafe,e(j h) 

* . *- R 

•Ainsafe,e(^ ) 2) 

' . * . ■» R 

«/safc,e(^2) 

<Ainsafe,e( < ^3) 

' j "* ► R 

■/safe,e(*3) 

■Ainsafe,e($4) 

) ) * R 

^ S afe, e (^4) 

Figure 2.2: A schematic representation of the intervals J U nsafe, e(^i) (hi rec and 
J sa f e ,e{^i) (in blue) as defined by (2.7) for four functionals <5j that might be the subject 
of an experiment. <3?i is a good candidate for experiment effort, since the intervals do 
not overlap and hence experimental determination of <J?i(G, P) will certify or de-certify 
the system; $4 is not worth investigating, since it cannot distinguish safe scenarios from 
unsafe ones; $2 and $3 are intermediate cases, and <3?2 is a better prospect than $3. 



is done here for simplicity). The corresponding intervals J sa f e ,e(^) and J un safe,e( < ^) as 
defined by (2.7) and (2.3) are reported in Table 2.3 for various acceptable probabilities 
of failure e. Note that, for larger values of e, E is a "better" experiment in the sense that 
it can distinguish safe scenarios from unsafe ones (see also Figure 2.2); for smaller values 
of e, E is a poor experiment. In any case, since the intersection J S afe,e( c &) H J U nsafe,e(^) 
is not empty, E is not an ideal experiment. 

It is important to note that the optimization calculations necessary to compute the 
intervals J S afe,e(^ > ) and J U nsafe,e(^ ) ) are simplified by the application of Theorem 4.1: 
in this case, the objective function of \i is fi[v > 2.45] instead of fi[H = 0], but the 
constraints are once again linear inequalities on generalized moments of the optimization 
variable [i. 

On the number of total evaluations on H. Recall that, for simplicity, we have 
assumed the response function G to be known and given by H. A large number of 
evaluations of H has been used in Table 2.3 to ensure convergence towards the global 
optimum. It is important to observe that those evaluations of H are not (actual, costly) 
experiments but (cheap) numerical evaluations of equation (1.5). More precisely, the 
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J sake 

inf 


sup 


•Ainsa 
inf 


fe,e($) 
sup 


e = 0.100 

iterations until numerical convergence 
total evaluations of H 


0.000 

40 
1,000 


1.000 

40 
1,000 


0.000 

40 
1,000 


0.900 
300 
8,000 


e = 0.200 

iterations until numerical convergence 
total evaluations of H 


0.000 

40 
1,000 


1.000 

40 
1,000 


0.000 

40 
1,000 


0.800 
400 
12,000 


e = 0.300 

iterations until numerical convergence 
total evaluations of H 


0.000 

40 
1,000 


1.000 

40 
1,000 


0.000 

40 
1,000 


0.599 
1000 
33,000 



Table 2.3: The results of the calculation of the intervals J sa fe,e(^) an d -Amsafe.e^) for 
the functional ^(/u) := /j,[v > 2.45km • s -1 ]. Note that, as the acceptable probability of 
system failure, e, increases, the two intervals overlap less, so experimental determination 
of $(P) would be more likely to yield a decisive conservative certification of the system as 
safe or unsafe; the computational cost of this increased decisiveness is a greater number 
of function evaluations in the optimization calculations. All computational cost figures 
are approximate. 



method for selecting new best experiments does not require new experiments; i.e., it 
relies entirely on the information set A (which contains the information gathered from 
previous experiments). Hence those evaluations should not be viewed as "information 
gained from Monte Carlo samples" but as "pure CPU processing time". In situations 
where the numerical evaluation of H is expensive, one can introduce its cost in the 
optimization loop. An investigation of the best algorithm to perform the numerical 
optimization with the least number of function evaluations is a worthwhile subject but is 
beyond the scope of the present paper. Observe also that the method proposed in Section 
2 does not rely on the exact knowledge of the response function G. More precisely, in 
situations where the response function is unknown, the selection of next best experiments 
is still entirely computational, and based upon previous data/information gathered on G 
enforced as constraints in a numerical optimization algorithm. More precisely, in those 
situations, the optimization algorithm may require the numerical evaluation of a large 
number of admissible functions / (compatible with the prior information available on 
G) but it does not require any new evaluation of G. 

In situations where H is (numerically) expensive to evaluate, one would have to 
include the cost of these evaluations in the optimization loop and use fast algorithms 
exploiting the multi-linear structures associated with the computation of safe and unsafe 
intervals. Here we have used a genetic algorithm because its robustness. This algorithm 
typically converges at 10% of the total number of evaluations of H given in the last row 
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sup U(A E ,c) - C{A E ,c) 



outcomes c 




Figure 2.3: A schematic representation of the size of the prediction intervals 
su Poutcomes c {M{Ae,c) — £{Ae,c)) m the worst case with respect to outcome c. E4 is 
the most predictive experiment. 



of Table 2.3 but we have increased the number of iterations tenfold to guarantee a robust 
result. The investigation of efficient optimization algorithms exploiting the multi- linear 
structures of OUQ optimization problems is of great interest and beyond the immediate 
scope of this paper. 

Selection of the most predictive experiment. The computation of safe and unsafe 
intervals described in the previous paragraph allows of the selection of the most selective 
experiment. If our objective is to have an "accurate" prediction of ¥[G(X) > a], in the 
sense that U{A) — C(A) is small, then one can proceed as follows. Let Ae,c denote those 
scenarios in A that are compatible with obtaining outcome c from experiment E. An 
experiment E* that is most predictive, even in the worst case, is defined by a minmax 
criterion: we seek (see Figure 2.3) 

E* E argmin ( sup (U{A E ,c) ~ C(A E , C ))) (2.8) 

experiments E ^ outcomes c ' 

The idea is that, although we can not predict the precise outcome c of an experiment E, 
we can compute a worst-case scenario with respect to c, and obtain an optimal bound 
for the minimum decrease in our prediction interval for ¥[G(X) > a] based on the (yet 
unknown) information gained from experiment E. Again, the theorems given in this 
paper can be applied to reduce this kind of problem. Finding E* is a bigger problem 
than just calculating C(A) and U(A), but the presumption is that computer time is 
cheaper than experimental effort. 

Planning of campaigns of experiments. The idea of experimental selection can 
be extended to plan several experiments in advance, i.e. to plan campaigns of experi- 
ments. This aspect can be used to assess the safety or the design of complex systems 
in a minimal number of experiments (and also to predict bounds on the total number 
of required experiments). Just as a good chess player thinks several moves ahead, our 
framework allows for the design of increasingly sophisticated and optimal sequences of 
experiments that can be performed to measure key system variables. The implemen- 
tation of this strategy corresponds to a min max game "played against the Universe" 
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(a) Playing Chess against the universe 



(b) Let's play Clue, Round 1 



n(fO=G.5 = 1.0 & in7-(n) = 168.7r ) = 5.rj & constraint on.. 



/m(f/]=G.5 = 1.0 & ww(&)»il6&75 3:5 & meau(li) = 82.5 = Q.5 & constraint on. 
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(c) Let's play Clue, Round 2 



(d) Let's play Clue, Round 3 



Figure 2.4: Subfigure (a): Playing Chess Against the Universe. We choose which ex- 
periment E to perform and the universe selects the outcome c. Our objective is to 
minimize IA{A) — C(A). In the first round our possible moves correspond to a choice 
between experiments E\,E2,E% and E4. We perform experiment E4, the outcome c of 
that experiment (selected by the Universe) transforms the admissible into Ae 4 ,c- ln the 
second round, our possible moves correspond to a choice between experiments F\,F2 
and -F3. As in the game of Chess, several moves can be planned in advance by solving 
min max optimization problems, and the exponential increase of the number branches 
of the game tree can be kept under control by exploring only a subset of (best) moves. 
Subfigures (b), (c) and (d): Let's play Clue. 
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(Subfigure 2.4(a)). The well-known games of Clue/Cluedo and Twenty Questions are 
better analogies than chess for this kind of information game. In that sense, the plan- 
ning of campaigns of experiments is an infinite-dimensional Clue, played on spaces of 
admissible scenarios, against our lack of perfect information about reality, and made 
tractable by the reduction theorems. This aspect calls for more investigation since it has 
the potential to provide a new approach to the current scientific investigation paradigm, 
which is based on intuition, expert judgment, and guessing. 

Example 2.5 (Let's play Clue.). In Subfigures 2.4(b), 2.4(c) and 2.4(d) we consider 
again the admissible set Ah as given by (1.8) and select three most predictive experi- 
ments, sequentially, choosing the second one after having observed the outcome of the 
first one. The list of possible experiments corresponds to measuring the mean or vari- 
ance of thickness h, obliquity 6 or velocity v. Subfigures 2.4(b), 2.4(c) and 2.4(d) show 
U{Ah,e,c) for each of these experiments as a function of the re-normalized outcome 
value c. Since, in this example, we always have C(Ah,e,c) = 0, U{Ah,e,c) corresponds 
to the size of the prediction interval for the probability of non-perforation given the in- 
formation that the outcome of experiment E is c. Given the results shown in Subfigure 
2.4(b) we select to measure the variance of thickness as our first best experiment. Note 
that this selection does not require the knowledge of what the experiment will yield but 
only the knowledge of what the experiment can yield: we identify as the optimal next 
experiment the one that is most informative in the minimax sense, i.e. of all its possible 
outcomes (this does not require the knowledge of what the experiment will yield), its 
least informative outcome is more informative than the least informative outcome of 
any other candidate experiment. Although not necessary, this selection can, possibly, be 
guided by a model of reality (i.e. in this case a model for the probability distributions 
of h,9,v). Used in this manner, an accurate model will reduce the number of experi- 
ments required for certification and an inaccurate model will lead to a relatively greater 
number of experiments (but not to erroneous bounds). Subfigure 2.4(c) is based on the 
information contained in Ah and bounds on the variance of thickness (obtained from 
the first experiment). Our selection as second experiment is to measure the mean of 
thickness (leading to Subfigure 2.4(d)). 

3 Generalizations and Comparisons 

3.1 Prediction, extrapolation, verification and validation 

In the previous section, the OUQ framework was described as it applies to the the 
certification problem (1.1). We will now show that many important UQ problems, such as 
prediction, verification and validation, can be formulated as certification problems. This 
is similar to the point of view of [5], in which formulations of many problem objectives 
in reliability are shown to be representable in a unified framework. 

A prediction problem can be formulated as, given e and (possibly incomplete) 
information on P and G, finding a smallest b — a such that 
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which, given the admissible set A, is equivalent to solving 



inflb-a inf fi{a < f(X) < b] > 1 - e 



(3.2) 



Observe that [a, b] can be interpreted as an optimal interval of confidence for G(X) 
(although b — a is minimal, [a, b] may not be unique), in particular, with probability at 
least 1 - e, G(X) € [a,b\. 

In many applications the regime where experimental data can be taken is different 
than the deployment regime where prediction or certification is sought, and this is com- 
monly referred to as the extrapolation problem. For example, in materials modeling, 
experimental tests are performed on materials, and the model run for comparison, but 
the desire is that these results tell us something where experimental tests are impossible, 
or extremely expensive to obtain. 

In most applications, the response function G may be approximated via a (possibly 
numerical) model F. Information on the relation between the model F and the response 
function G that it is designed to represent (i.e. information on (x, F(x),G(x))) can be 
used to restrict (constrain) the set A of admissible scenarios (G,P). This information 
may take the form of a bound on some distance between F and G or a bound on some 
complex functional of F and G [52, 78]. Observe that, in the context of the certification 
problem (1.1), the value of the model can be measured by changes induced on the 
optimal bounds C(A) and U{A). The problem of quantifying the relation (possibly the 
distance) between F and G is commonly referred to as the validation problem. In some 
situations F may be a numerical model involving millions of lines of code and (possibly) 
space-time discretization. The quantification of the uncertainty associated with the 
possible presence of bugs and discretization approximations is commonly referred to as 
the verification problem. Both, the validation and the verification problem, can be 
addressed in the OUQ framework by introducing information sets describing relations 
between G, F and the code. 

3.2 Comparisons with other UQ methods 

We will now compare OUQ with other widely used UQ methods and consider the certi- 
fication problem (1.1) to be specific. 

• Assume that n independent samples Y\, . . . ,Y n of the random variable G(X) are 
available (i.e. n independent observations of the random variable G(X), all dis- 
tributed according to the measure of probability P). If t\Y{ > a] denotes the 
random variable equal to one if Yi > a and equal to zero otherwise, then 



is an unbiased estimator of ¥[G(X) > a]. Furthermore, as a result of Hoeffding's 
concentration inequality [33], the probability that p n deviates from P[G(X) > n] 



Pn ■ = 



(3.3) 



n 
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(its mean) by at least e/2 is bounded from above by exp(— §e 2 ). It follows that if the 
number of samples n is large enough (of the order of t log g)> then the certification 
of (1.1) can be obtained through a Monte Carlo estimate (using p n ). As this 
example shows, Monte Carlo strategies [50] are simple to implement and do not 
necessitate prior information on the response function G and the measure P (other 
than the i.i.d. samples). However, they require a large number of (independent) 
samples of G(X) which is a severe limitation for the certification of rare events 
(the e = 10~ 9 of the aviation industry would [83, 15] necessitate O(10 18 ) samples). 
Additional information on G and P can, in principle, be included (in a limited 
fashion) in Monte Carlo strategies via importance and weighted sampling [50] to 
reduce the number of required samples. 

• The number of required samples can also be reduced to ^(ln^) d using Quasi- 
Monte Carlo Methods. We refer in particular to the Koksma-Hlawka inequality 
[64], to [82] for multiple integration based on lattice rules and to [81] for a recent 
review. We observe that these methods require some regularity (differentiability) 
condition on the response function G and the possibility of sampling G at pre- 
determined points X. Furthermore, the number of required samples blows-up at 
an exponential rate with the dimension d of the input vector X. 

• If G is regular enough and can be sampled at at pre-determined points, and if X has 
a known distribution, then stochastic expansion methods [30, 29, 101, 4, 26, 19] 
can reduce the number of required samples even further (depending on the regu- 
larity of G) provided that the dimension of X is not too high [94, 14]. However, 
in most applications, only incomplete information on P and G is available and 
the number of available samples on G is small or zero. X may be of high dimen- 
sion, and may include uncontrollable variables and unknown unknowns (unknown 
input parameters of the response function G). G may not be the solution of a 
PDE and may involve interactions between singular and complex processes such 
as (for instance) dislocation, fragmentation, phase transitions, physical phenomena 
in untested regimes, and even human decisions. We observe that in many appli- 
cations of Stochastic Expansion methods G and P are assumed to be perfectly 
known and UQ reduces to computing the push forward of the measure P via the 
response (transfer) function I> a oG (to a measure on two points, in those situations 
C{A) = ¥[G>a] = U{A)). 

• The investigation of variations of the response function G under variations of the 
input parameters Xj, commonly referred to as sensitivity analysis [75, 76], allows 
for the identification of critical input parameters. Although helpful in estimating 
the robustness of conclusions made based on specific assumptions on input pa- 
rameters, sensitivity analysis, in its most general form, has not been targeted at 
obtaining rigorous upper bounds on probabilities of failures associated with cer- 
tification problems (1.1). However, single parameter oscillations of the function 
G (as defined by (1.10)) can be seen as a form of non- linear sensitivity analysis 
leading to bounds on ¥[G > a] via McDiarmid's concentration inequality [57, 58]. 
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These bounds can be made sharp by partitioning the input parameter space along 
maximum oscillation directions and computing sub-diameters on sub-domains [91]. 

• If A is expressed probabilistically through a prior (an a priori measure of probabil- 
ity) on the set possible scenarios (/,//) then Bayesian inference [48, 7] could in 
principle be used to estimate P[G > a] using the posterior measure of probability 
on (/, fj,). This combination between OUQ and Bayesian methods avoids the ne- 
cessity to solve the possibly large optimization problems (2.4) and it also greatly 
simplifies the incorporation of sampled data thanks to the Bayes rule. However, 
oftentimes, priors are not available or their choice involves some degree of arbi- 
trariness that is incompatible with the certification of rare events. Priors may 
become asymptotically irrelevant (in the limit of large data sets) but, for small e, 
the number of required samples can be of the same order as the number required 
by Monte-Carlo methods [79]. 

When unknown parameters are estimated using priors and sampled data, it is im- 
portant to observe that the convergence of the Bayesian method may fail if the 
underlying probability mechanism allows an infinite number of possible outcomes 
(e.g., estimation of an unknown probability on N, the set of all natural numbers) 
[17]. In fact, in these infinite-dimensional situations, this lack of convergence (com- 
monly referred to as inconsistency) is the rule rather than the exception [18]. As 
emphasized in [17], as more data comes in, some Bayesian statisticians will become 
more and more convinced of the wrong answer. 

We also observe that, for complex systems, the computation of posterior probabil- 
ities has been made possible thanks to advances in computer science. We refer to 
)] for a (recent) general (Gaussian) framework for Bayesian inverse problems and 
[6] for a rigorous UQ framework based on probability logic with Bayesian updating. 
Just as Bayesian methods would have been considered computationally infeasible 
50 years ago but are now common practice, OUQ methods are now becoming fea- 
sible and will only increase in feasibility with the passage of time and advances in 
computing. 

• The combination of structural optimization (in various fields of engineering) to 
produce an optimal design given the (deterministic) worst-case scenario has been 
referred to as Optimization and Anti-Optimization [27] (we also refer to crit- 
ical excitation in seismic engineering [21]). The main difference between OUQ 
and Anti-optimization lies in the fact that the former is based on an optimization 
over (admissible) functions and measures (/,//), while the latter only involves an 
optimization over /. Because of its robustness, many engineers have adopted the 
(deterministic) worst-case scenario approach to UQ (we refer to chapter 10 of [27]) 
when a high reliability is required. It is noted in [27] that the reason why prob- 
abilistic methods do not find appreciation among theoreticians and practitioners 
alike lies in the fact that "probabilistic reliability studies involve assumptions on 
the probability densities, whose knowledge regarding relevant input quantities is 
central". It is also observed that strong assumptions on P may lead to GIGO 
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( "garbage in-garbage out" ) situations for small certification thresholds e when re- 
liability estimates and probabilities of failure are very sensitive to small deviations 
in probability densities. On the other hand, UQ methods based on deterministic 
worst-case scenarios are oftentimes "too pessimistic to be practical" [21, 27]. We 
suggest that by allowing for very weak assumptions on probability measures, OUQ 
methods can lead to bounds on probabilities of failure that are both reliable and 
practical. Indeed, when applied to complex systems involving a large number of 
variables, deterministic worst-case methods do not take into account the improba- 
bility that these (possibly independent or weakly correlated) variables conspire to 
produce a failure event. 

The certification problem (1.1) exhibits one of the main difficulties that face UQ 
practitioners: many theoretical methods are available, but they require assumptions or 
conditions that, oftentimes, are not satisfied by the application. More precisely, the 
characteristic elements distinguishing these different methods are the assumptions upon 
which they are based, and some methods will be more efficient than others depending 
on the validity of those assumptions. UQ applications are also characterized by a set of 
assumptions/information on the response function G and measure P, which varies from 
application to application. Hence, on the one hand, we have a list of theoretical methods 
that are applicable or efficient under very specific assumptions; on the other hand, most 
applications are characterized by an information set or assumptions that, in general, do 
not match those required by these theoretical methods. It is hence natural to pursue 
the development of a rigorous framework that does not add inappropriate assumptions 
or discard information. 

We also observe that the effectiveness of different UQ methods cannot be compared 
without reference to the available information (some methods will be more efficient than 
others depending on those assumptions). For the hypervelocity impact example of Sub- 
section 1.2, none of the methods mentioned above can be used without adding (arbitrary) 
assumptions on probability densities or discarding information on the mean value or in- 
dependence of the input parameters. We also observe that it is by placing information 
at the center of UQ that the proposed OUQ framework allows for the identification of 
best experiments. Without focus on the available information, UQ methods are faced 
with the risk of propagating inappropriate assumptions and producing a sophisticated 
answer to the wrong question. These distortions of the information set may be of limited 
impact on certification of common events but they are also of critical importance for the 
certification of rare events. 

3.3 OUQ with random sample data 

For the sake of clarity, we have started the description of OUQ with deterministic infor- 
mation and assumptions (i.e. when A is a deterministic set of functions and probability 
measures). In many applications, however, some of the information arrives in the form 
of random samples. The addition of such sample data to the available information 
and assumptions leads to non-trivial theoretical questions that are of practical impor- 
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tance beyond their fundamental connections with information theory and nonparametric 
statistics. In particular, while the notion of an optimal bound (2.4) is transparent and 
unambiguous, the notion of an optimal bound on ¥[G(X) > a] in presence of sample 
data is not immediately obvious and should be treated with care. This is a very delicate 
topic, a full treatment of which we shall defer to a future work. 

4 Reduction of OUQ Optimization Problems 

In general, the lower and upper values 



are each defined by a non-convex and infinite-dimensional optimization problem, the so- 
lution of which poses significant computational challenges. These optimization problems 
can be considered to be a generalization of Chebyshev inequalities. The history of the 
classical inequalities can be found in [42], and some generalizations in [13] and [98]; in the 
latter works, the connection between Chebyshev inequalities and optimization theory is 
developed based on the work of Mulholland and Rogers [G2], Godwin [32], Isii [37, 38, 39], 
Olkin and Pratt [65], Marshall and Olkin [54], and the classical Markov-Krein Theorem 
[42, pages 82 & 157], among others. The Chebyshev-type inequalities defined by C(A) 
and U(A) are a further generalization to independence assumptions, more general do- 
mains, more general systems of moments, and the inclusion of classes of functions, in 
addition to the probability measures, in the optimization problem. Moreover, although 
our goal is the computation of these values, and not an analytic expression for them, the 
study of probability inequalities should be useful in the reduction and approximation of 
these values. Without providing a survey of this large body of work, we mention the field 
of majorization, as discussed in Marshall and Olkin [55], the inequalities of Anderson 
[ ], Hoeffding [33], Joe [40], Bentkus et al. [11], Bentkus [9, 10], Pinelis [69, 70], and 
Boucheron, Lugosi and Massart [16]. Moreover, the solution of the resulting nonconvex 
optimization problems should benefit from duality theories for nonconvex optimization 
problems such as Rockafellar [7 ] and the development of convex envelopes for them, as 
can be found, for example, in Rikun [73] and Sherali [ ]. Finally, since Pardalos and 
Vavasis [67] show that quadratic programming with one negative eigenvalue is NP-hard, 
we expect that some OUQ problems may be difficult to solve. 

Let us now return to the earlier simple example of an admissible set A\ in (2.1): the 
(non-unique) extremizers of the OUQ problem with the admissible set .Ai all have the 
property that the support of the push-forward measure on M contains at most two 
points, i.e. is a convex combination of at most two Dirac delta measures (we recall 
that #supp(/*^) is the number of points forming the support of /*^): 



C{A) 



inf a[f(X)>a) 



U{A) 



sup fi[f(X) > a] 



sup n[f(X) > a] 



sup 

#supp(/, M )<2 



fi[f(X) > a]. 
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The optimization problem on the left-hand side is an infinite-dimensional one, whereas 
the optimization problem on the right-hand side is amenable to finite-dimensional parametriza- 
tion for each /. Furthermore, for each /, only the two values of / at the support points 
of the two Dirac measures are relevant to the problem. The aim of this section is to 
show that a large class of OUQ problems — those governed by independence and linear 
inequality constraints on the moments, — are amenable to a similar finite-dimensional 
reduction, and that a priori upper bounds can be given on the number of Dirac delta 
masses that the reduction requires. 

To begin with, we first show that an important class of optimization problems over 
the space of m-fold product measures can be reduced to optimization over products of 
finite convex combinations of Dirac masses (m is the number of random input variables). 
Consequently, we then show in Corollary 4.4 that OUQ optimization problems where 
the admissible set is defined as a subset of function-measure pairs (/, n) that satisfy gen- 
eralized moment constraints Gf(fx) < can also be reduced from the space of measures 
to the products of finite convex combinations of Dirac masses. Theorem 4.7 shows that, 
when all the constraints are generalized moments of functions of /, the search space Q 
of functions can be further reduced to a search over functions on an m-fold product of 
finite discrete spaces, and the search over m-fold products of finite convex combinations 
of Dirac masses can be reduced to a search over the products of probability measures on 
this m-fold product of finite discrete spaces. This latter reduction completely eliminates 
dependency on the coordinate positions in X. Theorem 4.7 is then used in Proposition 
4.8 to obtain an optimal McDiarmid inequality through the formulation of an appro- 
priate OUQ optimization problem followed by the above-mentioned reductions to an 
optimization problem on the product of functions on {0, l} m with the m-fold products 
of measures on {0, l} m . This problem is then further reduced, by Theorem 4.9, to an 
optimization problem on the product of the space of subsets (power set) of {0, l} m with 
the product measures on {0, l} m . Finally, we obtain analytic solutions to this last prob- 
lem for m = 1,2,3, thereby obtaining an optimal McDiarmid inequality in these cases. 
We also obtain an asymptotic formula for general m. Moreover, the solution for m = 2 
indicates important information regarding the diameter parameters D\ and D2 (we refer 
to Example 2.2). For example, if D2 is sufficiently smaller than D\, then the optimal 
bound only depends on D\ and therefore, any decrease in D2 does not improve the 
inequality. See Subsection 10.1 for the proofs of the results in this section. 



4.1 Reduction of OUQ 

For a topological space X, let Fx (or simply F) denote the space of real-valued (Borel) 
measurable functions on X, and let Ai(X) denote the set of Borel probability measures 
on X. Denote the process of integration with respect to a measure [i by E„, and let 



A k (X) :-- 




6 X, aj > for j = 0, . . . , k and aj = 1 



3=0 
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denote the set of (k + l)-fold convex combinations of Dirac masses. When X = \^i=\ 
is a product of topological spaces, and we speak of measurable functions on the product 
X, we mean measurable with respect to the product cr-algebra and not the Borel a- 
algebra of the product. For more discussion of this delicate topic, see e.g. [41]. The 
linear equality and inequality constraints on our optimization problems will be encoded 
in the following measurable functions: 

5$: *^Rfor j = 1, . . . ,n' , 

and, for each i = 1, . . . , m, 

g] : Xi ->• R for j = 1,.. . ,m. 

Let A4 G C Ai m (X) denote the set of products of Borel measures for which these all these 
functions are integrable with finite integrals. We use the compact notation G(/i) < to 
indicate that \i G Ai G and that 

®M<0, for all j = l,...,n', 

E M [5j] < 0, j = 1, . . . , for alH = 1, . . . , m . 

Moreover, let r: X — > R be integrable for all fi G Ai G (possibly with values +oo or 
-co). For any set M C M G , let 

U{M) := sup E„[r], 
fj,eM 

with the convention that the supremum of the empty set is — oo. 

For a measurable function /, the map /i i— > may not be defined, since / may 

not be absolutely integrable with respect to /i. If it is defined, then it is continuous in 
the strong topology on measures; however, this topology is too strong to provide any 
compactness. Moreover, although [ , Theorem 14.5] shows that if / is a bounded upper 
semi-continuous function on a metric space, then integration is upper semi-continous in 
the weak-* topology, we consider the case in which X may not be metric or compact, 
and the functions / may be unbounded and lack continuity properties. The following 
results heavily use results of Winkler [100] — which follow from an extension of Choquet 
Theory (see e.g. [68]) by von Weizsacker and Winkler [ , Corollary 3] to sets of prob- 
ability measures with generalized moment constraints — and a result of Kendall [45] 
characterizing cones, which are lattice cones in their own order. These results generalize 
a result of Karr [43] that requires X to be compact, the constraint functions be bounded 
and continuous, and the constraints to be equalities. The results that follow are remark- 
able in that they make extremely weak assumptions on X and no assumptions on the 
functions /. Recall that a Suslin space is the continuous image of a Polish space. 

Theorem 4.1. Let X = YliLi be a product of Suslin spaces and let 

in 

M m (X) :=Q§M(Xi) 

2=1 
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denote the set of products of B or el probability measures on the spaces Xi. As above, 
consider the generalized moment functions G and the corresponding finite moment set 
Ai G . Suppose that r: X — > R is integrable for all \i G M G (possibly with values +00 or 
—00 ). Define the reduced admissible set 



Theorem 4.1 says that, on a product X of very general spaces Xi, optimization prob- 
lems constrained by n' linear moment constraints on X and n« linear moment constraints 
on each factor space Xi achieve their optima among those product measures whose i th 
marginal has support on at most n' + rij + 1 points of Xi. 

Remark 4.2. Using [99, Corollary 3], this theorem and its consequences below easily 
generalize from the situation where IE^fg^] < for each k to that in which E^fg^] G 1^ for 
each k, where k indexes the constraint functions, and where each 1^ is a closed interval. 
Consequently, such pairs of linear constraints introduce a requirement for only one Dirac 
mass, not the two masses that one might expect. Moreover, observe that the condition 
that the function r is integrable (possibly with values +00 or —00) for all /i G A4 G is 
satisfied if r is non-negative. In particular, this holds when r is an indicator function of 
a set, which is our main application in this paper. 

Remark 4.3. Theorem 4.1 and its consequents below can be expressed more generally 
in terms of extreme points of sets of measures, whereas in the above case, the extreme 
points are the Dirac masses. To that end, Dynkin [24] describes more general sets of 
measures and their extreme points, which can be useful in applications. In particular, 
one could consider 

1. sets of measures that are invariant under a transformation (the extreme points are 
the ergodic measures); 

2. symmetric measures on an infinite product space (the extreme points are the simple 
product measures); 

3. the set of stationary distributions for a given Markov transition function; 

4. the set of all Markov processes with a given transition function. 

We now apply Theorem 4.1 to obtain the same type of reduction for an admissible 
set A C T x J\A m (X) consisting of pairs of functions and product measures — this is the 
case for the OUQ optimization problems C(A) and U(A). Let Q C T denote a subset of 
real-valued measurable functions on X and consider an admissible set A C Q x M m {X) 
defined in the following way. For each / £ Q, let G(f, •) denote a family of constraints 
as in Theorem 4.1 and Remark 4.2. For each / G Q, let M G f C M m (X) denote 
those product probability measures [i such that the moments G(/, jj) are well-defined. 




Then, it holds that 



U(M G ) =U(M A )- 



37 



Moreover, for each / 6 Q, let rj : X — > R be integrable for all fi G 7W G/ (possibly with 
values +00 or —00). Define the admissible set 



A := G <? x M m (X) I G(/,/x) < 0} 

and define the OUQ optimization problem to be 

U(A):= sup E„[r f ]. 



(4.1) 



(4.2) 



Corollary 4.4. Consider the OUQ optimization problem (4.2) and define the reduced 
admissible set Aa Q A by 



Aa := <(f,ti)€Gx (g)A n . +n ,(^; 
I i=i 



g(/,m)<o 



(4.3) 



Then, it holds that 



U{A) = U(Aa)- 



Remark 4.5. Corollary 4.4 is easily generalized to the case where for each / 6 Q, i, 
and fixed 7^ i, G(/, /Ui, .., /Uj, .., // m ) has affine dimension at most mi as [ii varies. 
In this case 

Aa ■■= ((,/» G Q x ®A mi (^) G(/, M ) < o| . 

Remark 4.6. Linear moment constraints on the factor spaces Xi allow to consider 
information sets with independent random variables X\ , . . . , X m and weak constraints 
on the probability measure of the variables Xi. An example of such an admissible set is 
the one associated with Bernstein inequalities [12], in which a priori bounds are given 
on the variances of the variables X; . 



4.2 Generalized moments of the response function 

We now consider the case where the function rj := r o f is defined through composition 
with a measurable function r, and all n constraints are determined by compositions 
g'j := gj o /, with j = 1, . . . , n, of the function /. Hence, the symbol G(/, fi) will mean 
that all functions gj o f are /u integrable and will represent the values E^[gj o /] for 
j = 1, ... ,n. That is, we have the admissible set 

A ■= {(/, M) G G x M m {X) I G(/, fj) < 0} (4.4) 
and the optimization problem 

U(A) := sup E„[ro/] (4.5) 

as in (4.2). However, in this case, the fact that the criterion function r o f and the 
constraint functions gj°f are compositions of the function / permits a finite-dimensional 
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reduction of the space of functions Q to a space of functions on {0, . . . , n} m and a 
reduction of the space of m-fold products of finite convex combinations of Dirac masses 
to the space of product measures on {0, . . . ,n} m . This reduction completely eliminates 
dependency on the coordinate positions in X. 

Formulating this result precisely will require some additional notation. By the Well- 
Ordering Theorem, there exists a well-ordering of each X{. Suppose that a total ordering 
of the elements of the spaces Xi for i = 1, . . . , m is specified. Let J\f := {0, . . . , n} and 
V := {0, . . . ,n} m = N m . Every element fi G <^f =l A n (Xi) is a product n = (g)^ m 
where each factor \n is a convex sum of n + 1 Dirac masses indexed according to the 
ordering; that is, 

n 

Hi = a l k 5 x k 

k=0 

for some a\, . . . , a l n > with unit sum and some xj, . . . , € X% such that, with respect 
to the given ordering of Xt, 

xj < x\ < ■ ■ ■ < xf. 

Let J-j) denote the real linear space of real functions on T> = {0, . . . , n} m and consider 
the mapping 

m 

¥: T x A n (Xi) — >• Tt> 
i=i 

defined by 

( F (/>^)) (ii,i2,- ■ ■,im) = f(x\\x l 2 2 ,.. .,x%), i k eJ\f,k = l,...,m. 

¥ represents the values of the function / at the Dirac masses in h, but does not carry 
information regarding the positions of the Dirac masses or their weights. 

Theorem 4.7. Consider the admissible set A and optimization problem U(A) defined 
in (4.4) and (4.5) where r o f is integrable (possibly with values +oo or — oo ) for all 
product measures. For a subset Qjy Q J~v, define the admissible set 

A v = {(h,a) £Q V x M m (V)\E a [ gi oh] < for all j = 1, . . . ,n} (4.6) 

and the optimization problem 

U(At>) ■■= sup E a [r o h}. 

If 

w(gx 0A n (^)j =g v , 

then it holds that 

U{A)=U{A V ). 

When the constraint set also includes functions which are not compositions with 
/, then Theorem 4.7 does not apply. Although it does appear that results similar to 
Theorem 4.7 can be obtained, we leave that as a topic for future work. 
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4.3 Application to McDiarmid's inequality 

Theorem 4.7 can be applied to the situation of McDiarmid's inequality in order to obtain 
an optimal solution for that problem. Let Di > for i = 1, . . . , m and define 

Q ■= {/ g T | Osci(/) < Di for each i = l m}, (4.7) 

where 

Osci(/) := sup sup |/(. .. ,Xi, ...)-/(... . .)| . 

(xi,...,xm.)ex x'^Xi 

We have a product probability measure P on X and a measurable function H: X — ¥ R 
such that H £ Q. Suppose that we have an upper bound 

P[# - E P [fT| > a] < H(o, a) for all H £ G. (4.8) 

It follows that if H £ Q and E P [fl] < 0, then 

P[il > a] < P[# - E P [fl] > a] < H(a, G) for all F € Q with E P [fl] < 0. 

On the other hand, suppose that 

F[H >a]< W'(a, Q) for all H G Q with E P [iT] < 0. (4.9) 

It follows that 

F[H >a}< m'(a, G) for all H G Q with E P [H] = 0. 

Since the constraints Q and the event H— E P [fZ"] > a are invariant under scalar translation 
i — y H ~\~ c it follows that 

F[H - E ¥ [H] >a}< m'(a, G) for all H G Q. 

That is, the inequalities (4.8) and (4.9) are equivalent. 

McDiarmid's inequality [57, 58] provides the bound M(a,G) '■= exp(— for (4.8) 
and its equivalent (4.9), with 

m 

D 2 :=Y,Dl (4.10) 

i=l 

Define the admissible set corresponding to the assumptions of McDiarmid's inequality: 
AicD = {(/, y) G Q x M m (X) | K^\f\ < 0} , (4.11) 
and define the optimization problem 

Z-Mmcd) := sup n[f > a}. (4.12) 

(/,/i)e^lMcD 
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Since (H,F) G -4mcD and McDiarmid's inequality /j,[f > a] < exp(— ^-) is satisfied for 
all (/,//) G -4,McD, it follows that 



F[H >a}< U(Amcd) < exp (~%) ■ 



Moreover, the inequality on the left is optimal in the sense that, for every e > 0, there 
exists a McDiarmid-admissible scenario (/, fi) satisfying the same assumptions as (H, P) 
such that fi[f > a] > U(AmcT>) — £■ 

To apply the previous results to computing W(.Amcd)j let V := {0, l} m and define 

Qt> '■= {h G Tv | OsCk(h) < Dk for each k = 1, . . . , m}, 

where the inequality Oscfc(/i) < A for /i G J 7 © means that 

\h(si, . . . , s k , . . . , s m ) ~ h(s±, ...,s k i,...,s m )\ < A, (4.13) 

for all Sj G {0, 1}, j = 1, . . . , m, and all s^' G {0, 1}. Define the corresponding admissible 
set 

A v = {(h, a)eG v x M({0, l}) m | E a [h] < 0} (4.14) 
and the optimization problem 

U(A V ) := sup a[h > a]. (4.15) 

(h,a)€A-D 

Proposition 4.8. It holds that 

U(A Mc d)=K(Av). (4.16) 

We now provide a further reduction of U(Amcd) by reducing U(Av)- To that end, 
for two vertices s and t of V = {0, 1}"\ let /(s, t) be the set of indices % such that s\ / tj. 
For s G P, define the function /i s G J 7 © by 



h s (t) = a- ^ D i- 
iei(s,t) 



For CCD, define h c G T v by 



/i c (i) := max/i s (t) = a - min V A- (4.17) 

i6/(s,t) 

Let C := {C | (7 C D} be the power set of V (the set of all subsets of V), define the 
admissible set Ac by 

Ac := {(C,a)€Cx M({0, l}) m I ^a[h C ] < 0} (4.18) 

and consider the optimization problem 

U{Ac) := sup a{h c > a). (4.19) 
(C,a)eAc 
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Theorem 4.9. It holds that 

U{Av)=U{A c ). (4.20) 

Remark 4.10. The proof of this reduction theorem utilizes the standard lattice struc- 
ture of the space of functions Tt> m a substantial way. To begin with, the reduction to 
max/i = a is attained through lattice invariance. Moreover, we have a lattice J-x>, with 
sub-lattice Qx>, and for each C G C, the set Cx> ■= {h € Ft> \ {s \ h(s) = a} = C}} 
of functions with value a precisely on the set C is a sub-lattice. For a clipped h, let 
C(h) := {s E T> | h{s) = a} be the set where h has the value a. If for each C the set 



fl {f <h}nc v ng v 



h:C(h)=C 

is nonempty, then we obtain a reduction. However, not only is the set nonempty, but 
the map C i— > h c is a simple algorithm that produces a point in this intersection, and 
therefore an explicit reduction. We suspect that the existence of a simple reduction 
algorithm in this case is due to the lattice structures, and that such structures may be 
useful in the more general case. Indeed, the condition f < h implies that E a [/] < E a [/t] 
for any a, and the the condition that E a [/] < E a [/i] for all a implies that / < h, so that 
the above condition is equivalent to the non-emptiness of 



f| ^f|{E«[/] <E a [/i]U nc v ng v . 

h:C(h)=C {a J 

For the more general constraints, we would instead have to solve (i.e. find an element 
of) 

f| lf]{G(f,a)<G(h,a)}\nc v ngv 

h:C(h)=C {a J 

Remark 4.11. We refer to the following diagram for a summary of the relationships 
between admissible sets A, A a, At>, Ac, -4mcD> the reduction theorems and their as- 
sumptions. 



AicD(4.1i; 



Proposition 4.8 ' 



,4(4.4) 

Corollary 4.4 

A\(4.3) 
Theorem 4.7 
(4.6)^(4.14) 
Theorem 4.9 

A:(4.18) 



(/, m) e Q x M(Xt), n = »T=ttM 



n and rii generalized moment constraints on \i and m 

y 

/ii reduces to the weighed sum of n + m + 1 Diracs 



Quantity of interest r o /, n constraints E^lgj o /] < 

/ and /J, reduce to a function and a measure on a finite set 
(h, a) G Gt> x M m (V), V = {0, . . . , n} m 



The space of functions Qp has a lattice structure 
Y 

Functions h can be parameterized by a finite set 
(C,q) 6 C x M{{0,l}) m , C= {C | C C {0, l}" 1 } 
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5 Optimal Concentration Inequalities 



In this section, the results of Section 4 will be applied to obtain optimal concentration 
inequalities under the assumptions of McDiarmid's inequality and Hoeffding's inequality. 
The following subsection gives explicit concentration results under the assumptions of 
McDiarmid's inequality, and Subsection 5.2 gives explicit concentration results under 
the assumptions of Hoeffding's inequality. 

Surprisingly, these explicit results show that, although uncertainties may propagate 
for the true value of G and P, they might not when the information is incomplete on G 
and P. 

We refer to Subsection 10.2 for the proofs of the results in this section. 

5.1 Explicit solutions under the assumptions of McDiarmid's inequal- 
ity 

In this subsection, we will apply Theorem 4.9 to obtain explicit formulae for the OUQ 
problem U(Amcd) (defined in Equation (4.12)) under the assumptions of McDiarmid's 
inequality (4.11). More precisely, we will compute U{Ac) defined by equation (4.19) 
and use equalities (4.20) and (4.16) to obtain U{Amcd) = U(Ae). Observe that all the 
following optimization problems possess solutions because they involve the optimization 
of a continuous function (with respect to a) in a compact space. 

Since the inequalities (4.8) and (4.9) are equivalent, it follows that 

U(Amcd)= sup /x[/>a + E M [/]]. 
(f,lx)e9xM m 

In particular, if E„[/] < is replaced by E^[/] < b or E„[/] = b in McDiarmid's inequality 
assumptions (4.11), then the results given in this section remain valid by replacing a by 
M := a — b (observe that M plays the role of a margin). 

Those results should be compared with McDiarmid's inequality [57, 58], which pro- 
vides the bound 

sup n[f > a + Eftlfl] < exp f- ^° =$ ) • (5.1) 

The statements of the theorem will be given assuming that a > 0; in the comple- 
mentary case of a < 0, the solution is simply U(Amcd) = 1- 

To the best of the authors' knowledge, the optimal bounds given here are new. 
There is a substantial literature relating to optimization of concentration bounds and 
de-randomization algorithms (see for instance [85] and references therein) but, as far as 
the authors know, those bounds were suboptimal because they were obtained through 
the moment generating function technique. 
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5.1.1 Explicit solutions in dimensions one and two 

Theorem 5.1 (Explicit solution for m = 1). For m = 1, U(AmcT>) is given by 



U{A 



McD ) 




if Di < a, 
ifO<a<D 1 . 



(5.2) 



Theorem 5.2 (Explicit solution for m = 2). For m = 2, U(AmcT>) is given by 

if D 1 + D 2 < a, 

2 

-, if\D 1 -D 2 \ <a<D 1 + D 2 , 



U{A 



McD J 



o, 

(Di + D 2 



1 



^B X B 2 
a 



(5.3) 



max(_Di, D 2 ) 



if0<a< \Di - D 2 \. 



See Sub- figures 5.2(a), 5.2(b) and 5.2(c) for illustrations comparing the McDiarmid 
and OUQ bounds for m = 2 (as functions of (Di,D 2 ), with mean performance and 
failure threshold a = 1, the OUQ bound is calculated using the explicit solution (5.3)). 
Observe that 

• If a < D\ — D 2 , then a decrease in D 2 does not lead to a decrease in the OUQ 
bound U (.4mcd)- In other words, if most of the uncertainty is contained in the first 
variable (a + D 2 < D\), then the uncertainty associated with the second variable 
does not affect the global uncertainty; a reduction of the global uncertainty requires 
a reduction in D\ . 

• For Di + D 2 = 2a, the ratio between the OUQ bound and the McDiarmid bound 
is minimized near the diagonal. 

Remark 5.3. The maximum of (5.3) over D\,D 2 under the constraints D\ + D 2 = D 
and D\ > D 2 is achieved at D 2 = and is equal to 1 — a/D. The minimum of (5.3) over 
D\ , D 2 under the constraints D\ + D 2 = D and D\ > D 2 is achieved on the diagonal 
D\ = D 2 and is equal to (1 — a/D) 2 . 



5.1.2 Explicit solution in dimension three 

Assume that D 1 > D 2 > D 3 . Write 



0, 

(gi + D 2 + D 3 - 

27DiL> 2 D 3 
(Di +D 2 - a) 2 

±D X D 2 
Di 



if B x + D 2 + D 3 < a, 

if D 1 + D 2 -2D 3 <a<D 1 + D 2 + D 3 , 

\{D 1 -D 2 <a<D 1 +D 2 - 2D 3 , 
if < a < Di - D 2 . 



(5.4) 
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Figure 5.1: Comparison of the McDiarmid and OUQ bounds with zero mean performance 
and failure threshold a = 1. 




(c) Ratio of the two bounds: OUQ bound 
divided by McDiarmid bound, m = 2 

F and F versus D for a=1 and D =D =D 




0.5 1 1.5 2 2.5 3 

D 



(d) McDiarmid vs OUQ bound, m = 3 and 



F 1 and F ? versus D ( for a=1 and 0^ =1.5*0 

0.7 1 , , , , 




-0.1 1 1 1 1 1 1 1 

0.5 1 1.5 2 2.5 3 

D, 



(e) T x vs F 2 , D 1 = D 2 = D 3 



(f) 7i vs^ 2 , £>i =D 2 



D 3 



45 



and 



T 2 ■= max 4>(ji)ip(ji) (5.5) 
ie{l,2,3} 



where 

^- 2 ( 2 S- I )- 2 < 3 |- 1 ) + T^( 8 S- 2 i 

and 71,72,73 are the roots (in 7) of the cubic polynomial 

(1+7) 3 - ^(1 + l) 2 + B = 0, (5.6) 

where 

A:= bD r 2 ^ and B 



2D 2 - D 3 2D 2 - D 3 

Define a function <j> by 



where 



1, if 7 G (0,1) and 0(7) G (0,1), 
0, otherwise, 



of \ _ a Z?2 1 — 7 

(7j: " "D 3 (l-7 2 ) + ^3~T+7' 

By the standard formula for the roots of a cubic polynomial, the roots of (5.6) are 
given by 

1 / , 

7i : = -1 - g l-- 4 + K i + K 2) , 

1 . . 

72 := -1 - g (-A + a; 2 Ki +Wi« 2 ), 

73 := -1 - g (~ A + + ^2^2) , 



where 



1 , V3. 1 V3. /^i + V&V 

Wi:= -2 + "2"*' W2:= "2-^' Ki:= 
1 

K2 := f^^)', ft := -2A 3 + 275 and /3 2 := ft 2 - 4A 6 . 



Since there are 3 possible values for each cube root, K\ and k 2 must be taken so that 
they satisfy k±k 2 = A 2 . 

Theorem 5.4 (Explicit solution for m = 3). For m = 3 with D\ > Z?2 > -D3, W(^4mcd) 
is given by 

U(Au.cV>) = max(Ji, T 2 ). (5.7) 
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Remark 5.5. Sub-figure 5.2(d) compares the McDiarmid and OUQ bounds for m = 3, 
with zero mean performance, D\ = D2 = D3, and failure threshold a = 1. Sub- figure 
5.2(e) shows that that T2 > J~i for D\ large enough. Sub-figure 5.2(f) shows that if 
D x = D 2 = §L> 3 , then T 2 < T\ for all D x . Therefore, Sub-figures 5.2(e) and 5.2(f) 
suggest that the inequality Ti > F\ holds only if D3 « D 2 > and D2 is large enough 
relative to D\. 

Remark 5.6. For the application to the (SPHIR facility) admissible set (1.9) (described 
in Subsection 1.2.2), the sub-diameters of the surrogate H are: 8.86 mm 2 for thickness 
(.Di), 7.20mm 2 for velocity (D2), and 4.17mm 2 for obliquity (-D3). These values have 
been obtained by solving the optimization problems defined by (1.10) with f = H and i = 
1, 2, 3. The application of Theorem 5.4 with these sub-diameters and a = 5.5 mm 2 leads 
to T2 = 0.253 and T\ = 0.437 (see (5.4) and (5.5) for the definition and interpretation 
of T\ and T?). In particular, since D\ — D2 < a < D\ + D2 — 2D3, it follows from (5.4) 
that the obliquity sub-diameter does not impact T\ (decreasing D3 down to zero does 
not change the optimal bound 43.7% obtained from the third line of (5.4)). 



5.1.3 Solution in dimension m 

For Co G C, write 



sup a[h Co > a], 

a:{C ,a)£A c 



where h Co is defined by equation (4.17). 

Proposition 5.7. Assume that D\>--> D m -\ > D m . For Co := {(1,1,. 
it holds that 



(5.8) 



,1,1)}, 



U{A Co ) 



0, 



(E 



k D 

j=l U 3 



a 



if Ejli Dj ~ mD m <a< XlJLi Dj, 

if, for k e {1,... ,m- 1}, 

E*=i Dj -kD k <a< £ )t\ Dj - (k + l)D k+1 . 



(5.9) 



Remark 5.8. The maximum of (5.9) over D±, . . . , D m under the constraints T)\ H h 

D m = D and D\ > ■ ■ ■ > D m is achieved at D\ = D and is equal to 1 — a/D. 

The minimum of (5.9) over D\, . . . , D m under the constraints D\ + • • • + D m = D and 

£>!>•••> D m is achieved on the diagonal D\ — • • • — D m and is equal to (1 — a/D) m . 

Proposition 5.9. Assume that D± > • • • > D m -\ > D m . If a > ^2*^=1 Dj + D m , then 
^("4-Mcd) is given by equation (5.9). 

Remark 5.10. It follows from the previous proposition that, in arbitrary dimension 
m, the tail of U(Amcd) with respect to a is given by (5.9). Although we do not have 
an analytic solution for m > 4 and a < Y2T=i Dj + D m , a numerical solution can be 
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obtained by solving the finite-dimensional optimization problem (4.19) with variables 
(C, a). Observe that the range of a is [0, l] m . Although the range of C is the set of 
subsets of {0, l} m , we conjecture (based on symmetry and monotonicity arguments) 
that the extremum of (4.19) can be achieved by restricting C to sets C q defined by 
{se[0,lp|£™ lSi >g(withgE{l,...,m}). ' 

5.2 Explicit solutions under the assumptions of Hoeffding's inequality 

This subsection treats a further special case of OUQ, where the assumptions are those 
of Hoeffding's inequality [35]. Define the admissible set 



and define the optimization problem 

U{A md ) 

By Hoeffding's inequality, for a > 0, 



f = Xi + ■■■ + X m , 
0€®£iA<([6i- DM), 
E„[/] < 



sup fi[f>a] 
(/,M)e^ H fd 



(5.10) 



W(^Hfd) < exp -2 



Theorem 5.11. If m = 2, then 



Hid 



McDj 



(5.11) 



Remark 5.12. Another proof of Theorem 5.11 can be obtained using entirely different 
methods than presented in Section 10.2. Although omitted for brevity, this method may 
be useful in higher dimensions, so we describe an outline of it here. We begin at the 
reduction obtained through Proposition 4.8 to the hypercube. Whereas the proof of 
Theorem 5.11 first applies the reduction of Theorem 4.9 to subsets of the hypercube, 
here we instead fix the oscillations in each direction to be < d% < Di, and solve the 
fixed d := (d\,dj) case, not using a Langrangian-type analysis but a type of spectral 
reduction. We then show that the resulting value U(d) is increasing in d with respect to 
the standard (lexicographic) partial order on vectors. The result then easily follows by 
taking the supremum over all vectors < d < D. 

Theorem 5.13. Let m = 3, and define J-\ and T2 as in Theorem 5.4- If J~i > Ti, then 



If T\ <Ti, then 



U(A mA ) =W(^mcd)- 
W(^Hfd) < W(^mcd)- 



(5.12) 
(5.13) 
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Under the assumptions of Hoeffding's inequality, each variable Xj is bounded from 
below and from above. Without the upper bounds on the variables Xi, it is possible 
to use additional reduction properties and conjecture an explicit form for the optimal 
inequality on n[X\ + ■ ■ ■ + X m > a]. Here we refer to the work and conjecture of Samuels 
[77] (see also [42, p. 542]), which has been proven true for m = 1,2,3. 

Remark 5.14. The optimal Hoeffding inequality can be used for additive models (with 
response functions of the form X\ + • • • + X m ) but also to obtain optimal probabilities 
of deviations for empirical means. Furthermore the fact that the optimal concentration 
inequalities corresponding to Hoeffding's or McDiarmid's assumptions are the same for 
m = 2 and possibly distinct for m = 3 is a simple but fundamental result analogous to 
Stein's paradox [25]. 

6 Computational Implementation 

In this section, we discuss the numerical implementation of OUQ algorithms for the 
analytical surrogate model for hypervelocity impact introduced in Subsection 1.2. 

6.1 Extreme points of reduced OUQ problems are attractors 

We consider again the computation of the optimal bound U{Ah) (where Ah is the in- 
formation set given by Equation (1.8)) via the identity (1.12) derived from the reduction 
results of Section 4. For #supp(//j) < 2, i = 1, 2, 3, Figure 1.3 has shown that numerical 
simulations collapse to two-point support. Figure 6.1 shows that, even when a wider 
search is performed (i.e., over measures /x G 0^=1 ^fc(^) f° r k > 1), the calculated 
maximizers for these problems maintain two-point support: the velocity and obliquity 
marginals each collapse to a single Dirac mass, and the plate thickness marginal col- 
lapses to have support on the two extremes of its range. As expected, optimization over 
a larger search space is more computationally intensive and takes longer to perform. 
This observation suggests that the extreme points of the reduced OUQ problems are, in 
some sense, attractors — this point will be revisited in the next subsection. 

We also refer to Figures 1.4 and 6.2 for plots of the locations and weights of the Dirac 
masses forming each marginal as functions of the number of iterations. Note that the 
lines for thickness and thickness weight are of the same color if they correspond to the 
same support point for the measure. In particular, Figure 6.2 shows that at iteration 
number 3500 the thickness support point at 62.5 mils (shown in Figure 6.1) has zero 
weight. 

6.2 Coagulation— Fragmentation algorithm for OUQ 

The results of Sections 4 and 5 give explicit a priori bounds on the number of Dirac masses 
sufficient to find the lower and upper bounds C(A) and U(A) when the admissible set 
A is given by independence and linear inequality constraints. However, it is possible 
that reduction properties are present for more general admissible sets A. Can such 
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Figure 6.1: For ^supp(/ij) < 5, i = 1,2,3, the maximizers of the OUQ problem (1.12) 
associated with the information set (1.8) collapse to two-point support. Velocity, obliq- 
uity and plate thickness marginals collapse as in Figure 1.3. At iteration 7100, the 
thickness support point at 62.5 mils has zero weight, as can be seen in Figure 6.2. 
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Figure 6.2: Time evolution of the genetic algorithm search for the OUQ problem (1.12) 
associated with the information set (1.8) for #supp(/ij) < 5 for i = 1,2,3, as optimized 
by mystic. Four of the five thickness support points quickly converge to the extremes of 
its range, with weights 0.024, 0.058, and 0.539 at 60 mils and weight 0.379 at 105 mils. 
The thickness support point that does not converge to an extremum has zero weight. 
Obliquity and velocity each collapse to a single support point, again with the corre- 
sponding weights demonstrating fluctuations due to degeneracies. 
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"hidden" reduction properties be detected by computational means, even in the absence 
of theorems that prove their existence? 

Consider again the results of the previous subsection. Theorem 4.1 provides an a 
priori guarantee that, to find U(A), it is sufficient to search the reduced feasible set A a, 
which consists of those G A whose marginal distributions each have support on at 
most two points. However, Figure 6.1 provides numerical evidence that something much 
stronger is true: even if we search among measures /i G ®f=i for k > 1, the 

measures collapse to an optimizer 

H* G Ai(Afi)® Ao(X 2 ) ® A (X 3 ) 

(that is, two-point support on the thickness axis, and one-point support on the obliquity 
and velocity axes). In some sense, the (small support) optimizers are attractors for 
the optimization process even when the optimization routine is allowed to search over 
measures with larger support than that asserted by Theorem 4.1. 

Therefore, we propose the following general algorithm for the detection of hidden 
reduction properties. Let an admissible set A be given; for k G N, let 

A k :={(f^)EA\^eA k (X)} 

be the collection of admissible scenarios such that fi has support on at most k + 1 points 
of X. 

1. Fix any initial value of k G N. 

2. Numerically calculate U(Ak) and obtain a numerical (approximate) maximizer 
H* G A k - 

3. Calculate #supp(;U*) and proceed as follows: 

• If #supp(;U*) < k + 1, then the measure has coagulated to have less-than- 
maximally-sized support and we terminate the algorithm. 

• If #supp( / u*) = k + 1, then no coagulation/reduction has yet been observed. 
We enter a fragmentation phase: replace k by any k' > k and return to step 
2. 

Remark 6.1. It would be more accurate to say that the above algorithm is a sketch of an 
algorithm, and that its details should be adjusted to fit the circumstances of application. 
For example, if the admissible set A includes an independence constraint, then it would 
be appropriate to base decisions upon the cardinality of the support of the marginal 
distributions of fj,*, not on the cardinality of the support of \i* itself. The termination of 
the algorithm if #supp(/U*) < k + 1 is motivated by supposition that a hidden reduction 
property has been found and that U(A) has an (approximate) optimizer in Ak- 

Remark 6.2. We reiterate the point made in Remark 4.3 that these methods apply to 
more general situations than finite convex combinations of Dirac measures; finite convex 
combinations of Dirac measures are simply a well-known class of geometrically extreme 
probability measures (with respect to which numerical integration happens to be very 
easy), and can be replaced by the extremal points of any class of probability measures 
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as required by the situation of study. For example, if the OUQ problem of interest 
involved the invariant measures for some measurable transformation T: X —> X, then 
each occurence of A^(X) above would be replaced by 



{k for each j = 0, . . . , k, aj > 0, 

^^ajfij Hj G M(X) is ergodic with respect to T, 
3=0 and Ylj=o a j = 1 




6.3 The OUQ algorithm in the mystic framework 

As posed above, OUQ at the high level is a global optimization of a cost function that 
satisfies a set of constraints. This optimization is performed in mystic using the differ- 
ential evolution algorithm of Price & Storn [72, 88], with constraints satisfied through a 
modified Lagrange multiplier method [60]. 

The mystic optimization framework [59] provides a collection of optimization al- 
gorithms and tools that lowers the barrier to solving complex optimization problems. 
Specifically, mystic provides flexibility in specifying the optimization algorithm, con- 
straints, and termination conditions. For example, mystic classifies constraints as either 
"bounds constraints" (linear inequality constraints that involve precisely one input vari- 
able) or "non-bounds constraints" (constraints between two or more parameters), where 
either class of constraint modifies the cost function accordingly in attempt to maximize 
algorithm accuracy and efficiency Every mystic optimizer provides the ability to apply 
bounds constraints generically and directly to the cost function, so that the difference in 
the speed of bounds-constrained optimization and unconstrained optimization is mini- 
mized. Mystic also enables the user to impose an arbitrary parameter constraint function 
on the input of the cost function, allowing non-bounds constraints to be generically ap- 
plied in any optimization problem. 

The mystic framework was extended for the OUQ algorithm. A modified Lagrange 
multiplier method was added to mystic, where an internal optimization is used to satisfy 
the constraints at each iteration over the cost function [ D]. Since evaluation of the cost 
function is commonly the most expensive part of the optimization, our implementation 
of OUQ in mystic attempts to minimize the number of cost function evaluations required 
to find an acceptable solution. By satisfying the constraints within some tolerance at 
each iteration, our OUQ algorithm will (likely) numerically converge much more quickly 
than if we were to apply constraints by invalidating generated results (i.e. filtering) at 
each iteration. In this way, we can use mystic to efficiently solve for rare events, because 
the set of input variables produced by the optimizer at each iteration will also be an 
admissible point in problem space — this feature is critical in solving OUQ problems, 
as tens of thousands of function evaluations may be required to produce a solution. We 
refer to [60] for a detailed description of the implementation of the OUQ algorithm in 
the mystic framework (we also refer to [61]). 

Remark 6.3. Our implementation of the OUQ algorithm in mystic utilizes a nested 
optimization (an inner loop) to solve an arbitrary set of parameter constraints at each 
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evaluation of the cost function. We use evolutionary algorithms because they are robust 
and especially suited to the inner loop (i.e., at making sure that the constraints are 
satisfied, local methods and even some global method are usually not good enough for 
this). We also note that the outer loop can be relaxed to other methods (leading to a 
reduction in the total number of function evaluations by an order of magnitude). Fi- 
nally, although we observe approximate extremizers that are "computationally" distinct 
(Figure 6.2 shows that mass is traded wildly between practically coincident points), we 
have not observed yet "mathematically" distinct extrema. 

Measures as data objects. Theorem 4.1 states that a solution to an OUQ problem, 
with linear constraints on marginal distributions, can be expressed in terms of products 
of convex linear combinations of Dirac masses. In our OUQ algorithm, the optimizer's 
parameter generator produces new parameters each iteration, and hence produces new 
product measures to be evaluated within the cost function. For instance, the response 
function H, as defined by H(h,9,v) in (1.5), requires a product measure of dimension 
n = 3 for support. In Example (1.8), the mean perforation area is limited to [mi, 7712] = 
[5.5, 7.5] mm 2 , the parameters h,9,v are bounded by the range provided by (1.7), and 
products of convex combinations of Dirac masses are used as the basis for support. The 
corresponding OUQ code parameterizes the Dirac masses by their weights and positions. 

More generally, it is worth noting that our computational implementation of OUQ 
is expressed in terms of methods that act on a hierarchy of parameterized measure 
data objects. Information is thus passed between the different elements of the OUQ 
algorithm code as a list of parameters (as required by the optimizer) or as a parameterized 
measure object. Mystic includes methods to automate the conversion of measure objects 
to parameter lists and vice versa, hence the bulk of the OUQ algorithm code (i.e. an 
optimization on a product measure) is independent of the selection of basis of the product 
measure itself. In particular, since the measure data objects can be decoupled from the 
rest of the algorithm, the product measure representation can be chosen to best provide 
support for the model, whether it be a convex combination of Dirac masses as required by 
Example (1.8), or measures composed of another basis such as Gaussians. More precisely, 
this framework can naturally be extended to Gaussians merely by adding covariance 
matrices as data object variables and by estimating integral moments equations (with a 
Monte Carlo method for instance). 

7 Application to the Seismic Safety Assessment of Struc- 
tures 

In this section, we assess the feasibility of the OUQ formalism by means of an applica- 
tion to the safety assessment of truss structures subjected to ground motion excitation. 
This application contains many of the features that both motivate and challenge UQ, in- 
cluding imperfect knowledge of random inputs of high dimensionality, a time-dependent 
and complex response of the system, and the need to make high-consequence decisions 
pertaining to the safety of the system. The main objective of the analysis is to assess the 
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safety of a structure knowing the maximum magnitude and focal distance of the earth- 
quakes that it may be subjected to, with limited information and as few assumptions as 
possible. 

7.1 Formulation in the time domain 
7.1.1 Formulation of the problem 

For definiteness, we specifically consider truss structures undergoing a purely elastic 
response, whereupon the vibrations of the structure are governed by the structural dy- 
namics equation 

Mu(t) + Cu{t) + Ku(t) = f(t), (7.1) 

where u(t) G R^ collects the displacements of the joints, M is the mass matrix, C is the 
damping matrix, K is the stiffness matrix and f(t) G M> N are externally applied forces, 
such as dead- weight loads, wind loads and others. The matrices M, C and K are of 
dimension N x N, symmetric and strictly positive definite. Let T be an N x 3 matrix 
such that: T$j = 1 if the ith degree-of- freedom is a displacement in the jth. coordinate 
direction; and Tjj = otherwise. In addition, let Uo(t) G R 3 be a ground motion. 
Then, Tuq{€) represents the motion obtained by translating the entire structures rigidly 
according to the ground motion. We now introduce the representation 

u(t) = Tu (t) + v(t), (7.2) 

where v(t) now describes the vibrations of the structure relative to its translated posi- 
tion. Inserting (7.2) into (7.1) and using KT = and CT = (implied by translation 
invariance), we obtain 

Mv{t) + Cv{t) + Kv(t) = f(t) - MTu (t), (7.3) 

where —MTuo(t) may be regarded as the effective forces induced in the structure by 
the ground motion (we start from rest). We shall assume that the structure is required 
to remain in the elastic domain for purposes of certification. Suppose that the structure 
has J members and that all the external loads are applied to the joints of the structure. 
Let L be a J x N matrix such that the entries of the vector Lv give the axial strains of 
the members. The certification condition is, therefore, 

Halloo < Si, i = l,...,J, (7.4) 

where Si is the yield strain of the ith member and ||/||oo := ess sup |/| is the L°°-norm 
of a function / : R — > R. In what follows we will write 

Yi = Uv i = l,...,J, (7.5) 

for the member strains. Due to the linearity of the structure, a general solution of (7.3) 
may be formally obtained by means of a modal analysis. Thus, let q a G M N and oj a > 0, 
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a = 1,...,N, be the eigenvectors and eigenfrequencies corresponding to the symmetric 
eigenvalue problem (K — u}"^M)q a = 0, normalized by q^Mq a = 1. Let 

N 

v(t) = ^2v a (t)q a (7.6) 

a=l 

be the modal decomposition of v(t). Using this representation, the equation of motion 
(7.3) decomposes into the modal equations 

v a (t) + 2( a oj a v a (t) + io 2 a v a (t) = ql{f{t) - MTuo(t)), (7.7) 

where we have assumed that the eigenmocles Qq, are also eigenvectors of C and ( a is the 
damping ratio for mode i. The solution of (7.7) is given by the hereditary integral 

v a (t) = - [ e~W*-r) sin[Ua{t _ r )](^MTn (r)) — , (7.8) 

JO u a 

where, for simplicity, we set / = and assume that the structure starts from rest and 
without deformation at time t = 0. We can now regard structures oscillating under 
the action of a ground motion as systems that take the ground motion acceleration 
uo(t) as input and whose performance measures of interest are the member strains Yi, 
i = 1, . . . , J. The response function F mapping the former to the latter is given by 
composing (7.8), (7.6) and (7.5). 

7.1.2 Formulation of the information set 

In order to properly define the certification problem we proceed to define constraints on 
the inputs, i.e. the information set associated with the ground motion acceleration. As 
in [87], we regard the ground motion at the site of the structure as a combination of 
two factors: the earthquake source s and the earth structure through which the seismic 
waves propagate; this structure is characterized by a transfer function Let * denote 
the convolution operator; the ground motion acceleration is then given by 

ito(t) := (V*s)(t)- (7.9) 

We assume that s is a sum of boxcar time impulses (see [< ] page 230) whose am- 
plitudes and durations are random, independent, not identically distributed and of un- 
known distribution. More precisely, we assume that 

B 

s{t):=Y J X l s i {t), (7.10) 
i=i 

where X%, . . . , Xb are independent (not necessarily identically distributed) random vari- 
ables with unknown distribution with support in [— a max , a max ]^ (sj, B and a m ax are 

de- 
fined below) and such that EpQ] = 0. We also assume the components (-X^i, X^2, Xi,3) 
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of the vectors Xj to be independent. Since we wish to bound the probability that a struc- 
ture will fail when it is struck by an earthquake of magnitude Ml in the Richter (local 
magnitude) scale and hypocentral distance R, we adopt the semi-empirical expression 
proposed by Esteva [28] (see also [63]) for the maximum ground acceleration 

a AM L 



"max : — j jj , r>\2 ' (^•H) 



(Ro + Rf 

where ao, A and i?o are constants. For earthquakes on firm ground, Esteva [28] gives 
a = 12.3 • 10 6 m 3 • s~ 2 , A = 0.8 and R = 25 • 10 3 m. 

The functions Sj are step functions, with Si(t) equal to one for ^}=i T j — * < 
J2)=i T j an d equal to zero elsewhere, where the durations t\,...,tb are independent 
(not necessarily identically distributed) random variables with unknown distribution 
with support in [0, T max ] and such that f\ < E[rj] < T2- Observing that the average 
duration of the earthquake is ^[ r «] an d keeping in mind the significant effect of 

this duration on structural reliability [97], we select f\ = Is, = 2s, r max = 6s, and 
B = 20. 

The propagation through the earth structure gives rise to focusing, de-focusing, re- 
flection, refraction and anelastic attenuation (which is caused by the conversion of wave 
energy to heat) [87]. We do not assume the earth structure to be known, henceforth we 
assume that tp is a random transfer function of unknown distribution. More precisely, 
we assume that the transfer function is given by 

i=l 

where q := 20, r' = 10 s, c is a random vector of unknown distribution with support in 
{x E [—1, l] 9 | J2i=i x f ^ 1 an d Yli=i x i = 0} an d <Pi is a piecewise linear basis nodal 
element on the discretization ti,...,t q of [— r'/2, r'/2] with — £» = r'/q (tfi(tj) = 5ij, 
with dij = 1 if i = j and zero otherwise), ip has the dimension of 1/time and the 

constraint Ya=i c 1 — 1S equivalent to the assumption that (^7 J^ r ^ 2 |V>| 2 (i) di) ^ is, 
with probability one, bounded by a constant of order 1/r'. Analogously to the Green 
function of the wave operator, if) can take both positive and negative values (in time, for 
a fixed site and source) . Observe also that the constraint on the time integral of tp 2 leads 
to a bound on the Arias intensity (i.e., the time integral of (iio) 2 ), which is a popular 
measure of ground motion strength used as a predictor of the likelihood of damage 
to short-period structures [86]. The constraint YH=i c i = ensures that the residual 
velocity is zero at the end of the earthquake. Observe also that, since the maximum 
amplitude of s already contains the dampening factor associated with the distance R 
to the center of the earthquake (in 1/(Rq + R) 2 , via (7.11)), ip has to be interpreted as 
a normalized transfer function. Since propagation in anisotropic structures can lead to 
changes in the direction of displacements, the coefficients q should, for full generality, 
be assumed to be tensors. Although we have assumed those coefficients to be scalars 
for the clarity and conciseness of the presentation, the method and reduction theorems 
proposed in this paper still apply when those coefficients are tensors. 
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7.1.3 The OUQ optimization problem 

The optimal bound on the probability that the structure will fail is therefore the solution 
of the following optimization problem 

U{A) := sup n[F < 0], (7.13) 

where A is the set of pairs (F, fi) such that (1) F is mapping of the ground accelera- 
tion t i y uo(t) onto the margin minj = i ) ... j j(5i — ||li||oo) via equations (7.8), (7.6) and 
(7.5). (2) fi is a probability measure on the ground acceleration t — > uo(t) with support 
on accelerations defined by (7.9), (7.10), (7.12) (with B = 20). Under this measure, 
X\, . . . , Xb, t\,...,tb, c are independent (not necessarily identically distributed) ran- 
dom variables. For i = 1,...,B, Xi has zero mean and independent (not necessarily 
identically distributed) components X^, ^,3) with support in [— a max , a max ], the 

measure of Tj is constrained by f\ < E[Tj] < T2 and has support in [0, r max ]. The support 
of the measure on c is a subset of {x 6 [—1, l] q : J2i=i xf < 1&: J2i=i x i = 0}- 

7.1.4 Reduction of the optimization problem 

Problem (7.13) is not computationally tractable since the optimization variables take 
values in infinite-dimensional spaces of measures. However, thanks to Corollary 4.4, we 
know that the optimum of Problem (7.13) can be achieved by (1) Handling c as a deter- 
ministic optimization variable taking values in {x G [—1, l] q : Yli=i < iSz Y2i=i x i = 
0} (since no constraints are given on the measure of c) (2) Assuming that the mea- 
sure on each Xij (Xi = (Xn, Xi t 2, ^,3)) is the tensorization of two Dirac masses in 
[— cLmax, ^max] (since E[Xjj] = is one linear constraint) (3) Assuming that the measure 
on each Tj is the convex linear combination of 2 Dirac masses in [0, r max ] (f% < E[rj] < T2 
counts as one linear constraint). 

Observe that this reduced problem is of finite dimension (8B + q = 180) (counting 
the scalar position of the Dirac masses, their weights and subtracting the number of 
scalar equality constraints). 

7.1.5 Numerical results 

The truss structure is the electrical tower shown in Sub- figure 7.1(a). This structure has 
198 elements and we refer to [84] for precise numerical values associated with its geome- 
try. The material used for this structure is steel. The corresponding material properties 
are 7860 kg/m 3 for density, 2.1 -10 11 N/m 2 for the Young's modulus, 2.5- 10 s N/m 2 for the 
yield stress and £ = 0.07 for the (uniform) damping ratio. Calculations were performed 
with time-step At := 5.0- 10 -2 s. We refer to Sub-figure 7.2(a) for a graph of the optimal 
bound on the probability of failure (7.13) versus the maximum ground acceleration 7.11 
(in m • s~ 2 ). Using Esteva's semi empirical formula (7.11) with a hypocentral distance 
R equal to 25 km we obtain Sub- figure 7.1(b), the graph of the optimal bound on the 
probability of failure (7.13) versus the earthquake of magnitude Ml in the Richter (local 
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(a) The truss structure 

Figure 7.1: Numerical results associated with 
7.1.2. 
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the information set defined in Sub-section 



magnitude) scale at hypocentral distance R (the difference AMl between two consecu- 
tive points is 0.25). The "S" shape of the graph is typical of vulnerability curves [51]. 
We select one of the points in the transition region for further analysis — the point 
corresponding to a probability of failure of 0.631, a maximum ground acceleration of 
0.892 m • s -2 and an earthquake of magnitude 6.5. The vulnerability curve undergoes a 
sharp transition (from small probabilities of failures to unitary probabilities of failures) 
around maximum ground acceleration a max = 0.892m • s -2 . This transition becomes 
smoother as the number of independent variables in the description of the admissible set 
is increased (results not shown). 

For a max = 0.892 m • s" 2 (M L = 6.5), Sub-figures 7.2(b) and 7.2(c) show the (de- 
terministic) transfer function ip (the units in the x-axis are seconds) and 3 independent 
realizations of the earthquake source s(t) sampled from the measure ^o.892 maximizing 
the probability of failure. For this measure, Sub-figure 7.2(f) shows the axial strain 
of all elements versus time (in seconds) and Sub-figure 7.1(a) identifies the ten weak- 
est elements for the most probable earthquake (the axial strain of these elements are: 
0.00142317, 0.00125928, 0.00099657, 0.00081897, 0.00076223, 0.00075958, 0.00072190, 
0.00068266, 0.00062919, and 0.00061361) — the weakest two elements exceed the yield 
strain of 0.00119048 (shown in red in the figure). Sub-figures 7.2(d) and 7.2(e) show 3 
independent horizontal ground acceleration and a power spectrum sampled from /io.892- 
The units in Sub-figure 7.2(e) are cycles per seconds for the x axis and m • s -2 for the y 
axis. The units in Sub-figure 7.2(d) are seconds for the x axis and m • s -2 for the y axis. 

An quantitative analysis of the numerical results also show that all the constraints 
are active at the extremum (i.e. the generalized moments inequality constraints on 
/i defining the information set introduced in Sub-section 7.1.2 are equalities or near 
equalities at the extremum) . The positions and weights of the Dirac masses associated 
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(a) Maximum PoF vs a_ 



(b) Transfer function ip 




(a) M L = 6 



(b) M L = 6.5 



(c) Ml = 7 



Figure 7.3: Positions (abscissa, in m • s~ 2 ) and weights (ordinates) of the Dirac masses 
associated with the measure of probability on X%, . . . ,Xb at the extremum for earth- 
quakes of magnitude Ml = 6, Ml = 6.5 and Ml = 7. Note that the positions in abscissa 
correspond to the possible amplitudes of the impulses JQ. 



with durations and transfer coefficients do not appear to show any discernible trend. 
However, the positions and weights of the Dirac masses associated with the amplitudes 
Xi,... ,Xm show a trend (as function of the earthquake magnitude Ml) illustrated in 
Figure 7.3. This trend suggests that for strong earthquakes, probabilities of failures are 
maximized via (the possibility of) large amplitude impulses. 

On the numerical optimization algorithm. Global search algorithms often require 
hundreds of iterations and thousands of function evaluations, due to their stochastic 
nature, to find a global optimum. Local methods, like Powell's method [71], may require 
orders of magnitude fewer iterations and evaluations, but do not generally converge to 
a global optimum in a complex parameter space. To compute the probability of failure, 
we use a Differential Evolution algorithm [72, 88] that has been modified to utilize 
large-scale parallel computing resources [59]. Each iteration, the optimizer prepares m 
points in parameter space, with each new point derived through random mutations from 
the 'best' point in the previous iteration. We select m = 40, which is of modest size 
compared to the dimensionality of the problem — however, we chose this modest size 
because populations larger than m = 40 only modestly increase the efficiency of the 
algorithm. Each of these m evaluations are performed in parallel on a computer cluster, 
such that the time required for a single iteration equals the time required for a single 
function evaluation. After n iterations complete, the optimal probability of failure for 
the product measure is returned (convergence is observed around n ~ 200 and we select 
n ~ 2000 for the robustness of the result). 

Only one iteration is required for values of ground acceleration on the extremes 
of the range (such as Ml = 2 and Ml = 9). The number of iterations required for 
convergence for points in the transition region (around Ml = 6.5) is between 30 and 50 
(which corresponds to 2,400 to 4,000 function evaluations). We refer to Figure 7.4 for 
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(a) Estimated Maximum PoF vs iterations (b) Dirac Positions vs iterations 

Figure 7.4: (a) : Estimated maximum probability of failure versus number of iterations for 
an earthquake of magnitude Ml = 6.5 (this corresponds to the point in transition region 
of Sub-figure 7.1(b)). (b): re-normalized positions of the Dirac masses for Ml = 6.5. 



an illustration of the convergence of the optimization algorithm for Ml = 6.5. 

Each function evaluation takes approximately 0.5 s on a high-performance computing 
cluster (such as the high-performance computing clusters used at the National Labs). 
With each iteration utilizing m = 40 parallel processors, the OUQ calculation takes 
roughly 24hrs. 

Approximately 1000 time steps are required for accuracy in the strain calculations, 
each function evaluation requires two convolutions over time. Because of the size of the 
truss structure (198 elements), eigenvalues have to be computed with high accuracy. 
Because of the size of the product measure associated with the numerical optimization 
iterates, the probability of failure (associated with these iterates) should be estimated 
with a controlled (and adapted) tolerance rather than computed exactly — we use a 
sampling size of 5000 points. 

7.2 OUQ and critical excitation. 

Without constraints on ground acceleration, the ground motion yielding the maximum 
peak response (maximum damage in a deterministic setting) has been referred to as the 
critical excitation [20]. Drenick himself pointed out that a seismic design based on critical 
excitation could be "far too pessimistic to be practical" [21]. He later suggested that 
the combination of probabilistic approaches with worst-case analysis should be employed 
to make the seismic resistant design robust [22]. Practical application and extension of 
critical excitation methods have then been made extensively and we refer to [92] and 
[93] for recent reviews. The probabilities of failures obtained from stochastic approaches 
depend on particular choices of probability distribution functions. Because of the scarcity 
of recorded time-histories, these choices involve some degree of arbitrariness [87, 92] that 
may be incompatible with the certification of critical structures and rare events [23]. We 
suggest that by allowing for very weak assumption on probability measures, the reduction 
theorems associated with the OUQ framework could lead to certifications methods that 
are both robust (reliable) and practical (not overly pessimistic). Of course this does 
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require the identification of a reliable and narrow information set. The set A used in 
this paper does not include all the available information on earthquakes. We also suggest 
that the method of selecting next best experiments could help in this endeavor. 

Observe also that without constraints, worst-case scenarios correspond to focusing 
the energy of the earthquake in modes of resonances of the structure. Without correla- 
tions in the ground motion these scenarios correspond to rare events where independent 
random variables must conspire to strongly excite a specific resonance mode. The lack of 
information on the transfer function ip and the mean values Efr^] permits scenarios char- 
acterized by strong correlations in ground motion where the energy of the earthquake 
can be focused in the above mentioned modes of resonance. 

7.3 Alternative formulation in the frequency domain 

A popular method for modeling and synthesizing seismic ground motion is to use (deter- 
ministic) shape functions and envelopes in the frequency domain (see [96] for a review). 

In this sub-section, we will evaluate the safety of the electrical tower shown in Sub- 
figure 7.1(a) using an admissible set Af defined from weak information on the probability 
distribution of the power spectrum of the seismic ground motion. 

7.3.1 Formulation of the information set 

We assume that the (three dimensional) ground motion acceleration is given by 



where the Fourier coefficients Aj are random variables (in R) of unknown distribu- 
tion. We assume that W := 100 and that u k ■= k/rj with = 20 s. Writing 
A := (Ai, . . . , Aqw), we assume that 



where a max is given by Esteva's semi-empirical expression (7.11) and B(0, a max )\B(0, — ) 
is the Euclidean ball of R 6M/ of center and radius a max minus the Euclidean ball of 
center and radius ^aax. 

Although different earthquakes have different power spectral densities it is empirically 
observed that "on average", their power spectra follow specific shape functions that 
may depend on the ground structure of the site where the earthquake is occurring [47]. 
Based on this observation, synthetic seismograms are produced by filtering the Fourier 
spectrum of white noise with these specific shape functions [47]. In this sub-section, 
our information on the distribution of A will be limited to the shape of the mean value 
of its power spectrum. More precisely, we will assume that, for k £ {1,...,W} and 



w 
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where s is the Matsuda-Asano shape function [56] given by: 

s{uj) := K-- 2 ) 2 + 4^,^' (7 ' 17) 

where oj g and £ g are the natural frequency and natural damping factor of the site and 

w 

s :=Y,s("k). (7.18) 

k=l 

We will use the numerical values u g = 6.24 Hz and £ 9 = 0.662 associated with the Jan- 
uary 24, 1980 Livermore earthquake (see [49], observe that we are measuring frequency 
in cycles per seconds instead of radians per seconds). The purpose of the normalization 
factor (7.18) is to enforce the following mean constraint: 



E 



- [ Td \u (t)\ 2 dt]=lE[\A\*] = ^p. (7.19) 

Td Jo 1 2 4 



Observe also that (7.15) implies that, with probability one, 

^SSS < 1 f Td \u (t)\ 2 dt < (7.20) 
8 r d J 2 

We write Af the set of probability measures fi on A satisfying (7.15) and (7.16). 
7.3.2 OUQ objectives 

Let (Yi, . . . , Yj) and (Si, . . . , Sj) be the axial and yield strains introduced in Sub-section 
7.1.1. Writing S := [—Si, Si] x ••• x [—Sj,Sj] (this is the safe domain for the axial 
strains), we are interested in computing optimal (maximal and minimal with respect to 
measures \i G Af) bounds on the probability (under fi) that Y(t) S for some t G [0, Td] 
(defined as the probability of failure). From the linearity of equations (7.3), the strain 
of member i (i E {1, . . . , J}) at time t can be written 

Y i (t)=Y,*n{t) A r (7-21) 

Let ^>(t) be the J x (QW) tensor (^y(t)) and observe that Equation (7.21) can be also 
be written Y(t) = ty(t)A. Let J- be the subset of M. ew defined as the elements x of 
B(0,a max )\B(0, ezf*) such that V(t)x [-Si, Si] x • • • x [-Sj,Sj] for some t G [0,T d ], 
i.e. 

T:={x£B(0,a max )\B(0,^)\*(t)x£S for some te[0,r d ]j. (7.22) 

Observe T corresponds to the set of vectors A (in (7.14)) that lead to a failure of the 
structure. Henceforth, our objective can be formulated as computing 

sup fl [AGJ 7 ] and inf n[AeF], (7.23) 
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where Af is the set of probability measures [i such that 
fj,[Ae B(0,a max ) \ B(0, ^)] = 1, and that 

E,[A]]= bj with bj:= ^±mml. (7 . 2 4) 

12 so 

In other words, Af an infinite-dimensional polytope defined as the set of probability 
measures on ground acceleration that have the Matsuda-Asano average power spectra 
(7.17). It is important to observe that that with the filtered white noise method the 
safety of the structure is assessed for a single measure fio £ Af whereas in the proposed 
OUQ framework we compute best and worst-case scenarios with respect to all measures 
in Af- 



7.3.3 Reduction of the optimization problem with Dirac masses 

Since (7.24) corresponds to 6W global linear constraints on /i, Theorem 4.1 implies that 
the extrema of problem (7.23) can be achieved by assuming /i to be a weighted sum of 
Dirac masses V -'V ; where Z.j £ B(0, a max )\ J B(0, Pj > and ^=i +1 Pj = 

1. The constraints (7.24) can then be written: for a £ {1, ... , 6W}, Y^i 1 z f,jPj = b i- 
Furthermore, J 7 ] = Ylj-z efPy 

7.3.4 Reduction of the optimization problem based on strong duality 

Since the information contained in Af is limited to constraints on the moments of A, 
strong duality can be employed to obtain an alternative reduction of problems (7.23). 
Indeed, Theorem 2.2 of [1 3] implies that 

sup Jl = inf Ho + ^TlIibi, (7.25) 

where the minimization problem (over the vector (Hq,H) := (Hq, H±, . . . , H§\\r) £ 
M 6H/+1 ) in the right hand side of (7.25) is subject to 

6W 

^H iX ? + H > X (x) on B{0,a max )/B(0,^), (7.26) 

i=\ 

where x{ x ) is the function equal to 1 on J- and on (J 7 ) (we note (J-) c the complement 
of J-, i.e. the set of x in M. GW that are not elements of J-). Similarly, 

mifi[AGT]= sup Ho + ^Hibi, (7.27) 

where the maximization problem in the right hand side of (7.27) is subject to 

6W 

Y,HiX* + H < X (.x) on B{0,a max )/B(0,^). (7.28) 

i=l 
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Figure 7.5: Maximum and minimum probability of failure of the structure (as defined in 
(7.23)) versus the earthquake of magnitude Ml in the Richter (local magnitude) scale 
at hypocentral distance R = 25 km (a max is given by Esteva's semi-empirical expression 
(7.11) as a function of Ml). The curve corresponding to maximum probability of failure 
is not the same as the one given in Sub-figure 7.1(b) because it is based on a different 
information set. 




We conclude from these equations (by optimizing first with respect to Hq) that the 
optimal upper bound on the probability of failure (defined as the probability that the 
displacement Y(t) does not belong to the safe region S for all time t in the interval 

M) is 

aw 

sup J] = inf sup x( x ) + ^ Hi(bi - xf), (7.29) 

whereas the optimal lower bound is 

6W 

Mu\AeJ r ] = sup inf y(x) + V HAbi - xf). (7.30) 

/j.£Af J Hm ew xeB(0, ttm „)/B(0,^) ~{ 

Observe that problem (7.29) is convex in H G M. ew whereas problem (7.30) is concave. 
7.3.5 Numerical results 

The optimal bounds (7.23) can be computed using the reduction to masses to Dirac 
described in Sub-section 7.3.3 or strong duality as described in Sub-section 7.3.4. While 
the latter does not identify the extremal measures it leads to a smaller optimization prob- 
lem than the former (i.e. to optimization variables in M 12W/ , instead of ]^_( ew + 1 ) x ( ew + 1 )y 
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The simplification is allowed by the facts that the response function is well identified, 
that there are no independence constraints, and that the information on A is limited 
to 6W (scalar) moment constraints. The vulnerability curves of Figure 7.5 have been 
computed using strong duality as described in Subsection 7.3.4 (an identification of ex- 
tremal measures would require using the method described in Subsection 7.3.3). Observe 
that to decrease the gap between the maximum probability of failure and the minimum 
probability of failure, one would have to refine the information on the probability distri- 
bution of ground motion (by, for instance, adding constraints involving the correlation 
between the amplitudes A{ different Fourier modes). To solve optimization problems 
(7.29) and (7.30) we use the modified Differential Evolution algorithm described in Sub- 
section 7.1.5. Equation (7.29) is implemented as a minimization over H, where a nested 
maximization over x is used to solve for snp xeB f 0av \/ B /Q °max \ \(x) — Yl^i-Hixj at 
each function evaluation. Both the minimization over H, and the maximization over 
x use the Differential Evolution algorithm described above, where the optimizer config- 
uration itself differs only in that for the nested optimization termination occurs when 
the maximization over x does not improve by more than 10 -6 in 20 iterations, while 
the outer optimization is terminated when there is not more than 10 -6 improvement 
over 100 iterations. The optimization over H is performed in parallel, as described in 
Subsection 7.1.5, where each of the nested optimizations over x are distributed across 
nodes of a high-performance computing cluster. Each of the (nested) optimizations over 
x require only a few seconds on average, and thus are performed serially. Convergence, 
on average takes about 15 hours, and is obtained in roughly 2000 iterations (over H), 
corresponding to 35000 to 50000 function evaluations. Each function evaluation is a 
nested optimization over x, which takes a few seconds on a high-performance computing 
cluster. 



8 Application to Transport in Porous Media 

We now apply the OUQ framework and reduction theorems to divergence form elliptic 
PDEs and consider the situation where coefficients (corresponding to microstructure and 
source terms) are random and have imperfectly known probability distributions. Treat- 
ing those distributions as optimization variables (in an infinite-dimensional space) we 
obtain optimal bounds on probabilities of deviation of solutions. Surprisingly, explicit 
and optimal bounds show that, with incomplete information on the probability distri- 
bution of the microstructure, uncertainties or information do not necessarily propagate 
across scales. 

To make this more precise in a simple setting, let T> C R be a bounded domain and 
consider u(x,u), the solution of the following stochastic elliptic PDE: 

- div(«(x, u)X7u(x, u)) = f{x, u), x G V 
u(x, oS) = 0, x £ dV 

with random microstructure n and random (positive) source term /. Physically, u can 
be interpreted as the pressure (or head) in a medium of permeability k with source /; the 
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fluid velocity is given by Vu. For a given point xo in the interior of T>, we are interested 
in computing the least upper bound on the probability of an unsafe supercritical pressure 
at xo: 

U(A) := supE^loguOo,^) > E^[logu(x ,uj)} + a] , (8.2) 

where A is a set of probability measures on (k, f). In this section we will focus on the 
two admissible sets A described below. 

Let D 1 ,D 2 > 0, K,F G L°°(V) such that essinf© K > 0, F > 0, and f v F > 0. 
Define 

k, f independent under fj,, 
l ■= \H K(x) < k(x,oj) < e Dl K(x), } . (8.3) 
F(x) < f(x,ui) < e D2 F(x) 



A 



We say that a function g defined on T> is periodic of period 5 if for all x & T>, it holds 
that g(x) = g(x + 5) whenever x + 5 £ T>. We now define 



A^^2 •" 



K = KlK 2 , 

K\,K2 independent under /i, 
||Vki|U« < e Dl ||V^i|| Loo 
K2 periodic of period 5 
Ki{x) < ki(x,w) < e Dl Ki(x) 
K 2 (x) < k 2 {x,oj) < e D *K 2 (x) 



(8.4) 



where 0<<5<1, K 2 G L°°{T>) is uniformly elliptic over T> and periodic of period 5, and 
K\ is smooth and uniformly elliptic over T>. 

PDEs of the form (8.1) have become a benchmark for stochastic expansion methods 
[29, 101, 4, 26, 19, 94, 14] and we also refer to [30] for their importance for transport in 
porous media. 

These PDEs have also been studied as classical examples in the UQ literature on 
the basis that the randomness in the coefficients (with a perfectly known probability 
distribution on the coefficients (k, f)) is an adequate model of the lack of information on 
the microstructure k. In these situations the quantification of uncertainties is equivalent 
to a push forward of the measure probability on (k, /). 

However, in practical situations the probability distribution on the coefficients (k, f) 
may not be known a priori and the sole randomness in coefficients may not constitute 
a complete characterization of uncertainties. This is our motivation for considering the 
problem described in this section. We have also introduced the admissible set (8.4) as 
a simple illustration of uncertainty quantification with multiple scales and incomplete 
information on probability distributions. To relate this example to classical homogeniza- 
tion [8] we have assumed k 2 to be periodic of small period S <C 1. 

Theorem 8.1. We have 



U{A Kj ) =U(A K1K2 ) = U{A M cd) 



(8.5) 



G8 



with 

'O, ifD 1 +D 2 <a 
(D 1 + D 2 - a) 2 



±D X D 2 

1 -, ifO < a < \Di - D 2 



if\D 1 -D 2 \<a<D 1 + D 2 , ( 8 . 6 ) 



m&x(Di, D 2 ) 

Before giving the proof of Theorem 8.1, we make a few important observations: 
It follows from Theorem 8.1 that if D 2 > a + D%, then U(A K j)(a, D u D 2 ) = 
U(A K /)(«, 0, D 2 ). In other words, if the uncertainty on the source term / is domi- 
nant, then the uncertainty associated with the microstructure, k, does not propagate 
to the uncertainty corresponding to the probability of deviation of logu(xo, ui) from its 
mean. 

Now consider A KlK2 . Since K\ is constrained to be smooth and k 2 periodic with 
period 5 -C 1, one would expect the microstructure k 2 to appear in the probability 
of deviation in a homogenized form. However, Theorem 8.1 shows that if D\ > a + 
D 2 , then 14 (A n 1 K2){ a i D\, D 2 ) — l4(A K1K2 )(a, D\,0). More precisely, if the uncertainty 
associated with the background K\ is dominant, then the uncertainty associated with the 
microstructure k 2 does not propagate to the uncertainty corresponding to the probability 
of deviation of logu(xo,oj) from its mean. 

This simple but generic example suggests that for structures characterized by mul- 
tiple scales or multiple modules, information or uncertainties may not propagate across 
modules or scales. This phenomenon can be explained by the fact that, with incom- 
plete information, scales or modules may not communicate certain types of information. 
Henceforth, the global uncertainty of a modular system cannot be reduced without de- 
creasing local dominant uncertainties. In particular, for modular or multi-scale systems, 
one can identify (possibly large) accuracy thresholds (in terms of numerical solutions of 
PDEs or SPDEs) below which the global uncertainty of the system does not decrease. 

Proof of Theorem 8.1. Let us now prove Theorem 8.1 with the admissible set A K j (the 
proof with the set A KlK2 is similar). It follows from Theorem 2.11 and Proposition 2.13 
of [66] that the maximum oscillation of log u(xq,uj) with respect to k and / are bounded 
by D\ and D 2 we obtain that 

U{A K j) <U(A U cd), (8.7) 

where 14(Amcd) is defined in equation (4.12) (we consider the case m = 2). 

Next, from the proof of Theorem 5.2, we observe that the bound U(Amcd) can be 
achieved by A K j by considering measures n that are tensorizations of two weighted Dirac 
masses in k (placed at K and e Dl K) and two weighted Dirac masses in / (placed at F 
and e Dl F). This concludes the proof. □ 

9 Conclusions 

The UQ Problem — A Problem with UQ? The 2003 Columbia space shuttle ac- 
cident and the 2010 Icelandic volcanic ash cloud crisis have demonstrated two sides of the 
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same problem: discarding information may lead to disaster, whereas over-conservative 
safety certification may result in unnecessary economic loss and supplier-client conflict. 
Furthermore, while everyone agrees that UQ is a fundamental component of objective 
science (because, for instance, objective assertions of the validity of a model or the certi- 
fication of a physical system require UQ), it appears that not only is there no universally 
accepted notion of the objectives of UQ, there is also no universally accepted framework 
for the communication of UQ results. At present, the "UQ problem" appears to have 
all the symptoms of an ill-posed problem; at the very least, it lacks a coherent general 
presentation, much like the state of probability theory before its rigorous formulation by 
Kolmogorov in the 1930s. 

• At present, UQ is an umbrella term that encompasses a large spectrum of meth- 
ods: Bayesian methods, Latin hypercube sampling, polynomial chaos expansions, 
stochastic finite-element methods, Monte Carlo, etc. Most (if not all) of these 
methods are characterized by a list of assumptions required for their application 
or efficiency. For example, Monte Carlo methods require a large number of sam- 
ples to estimate rare events; stochastic finite-element methods require the precise 
knowledge of probability density functions and some regularity (in terms of decays 
in spectrum) for their efficiency; and concentration-of-measure inequalities require 
uncorrelated (or weakly correlated) random variables. 

• There is a disconnect between theoretical UQ methods and complex systems of 
importance requiring UQ in the sense that the assumptions of the methods do not 
match the assumption/information set of the application. This disconnect means 
that often a specific method adds inappropriate implicit or explicit assumptions 
(for instance, when the knowledge of probability density functions is required for 
polynomial chaos expansions, but is unavailable) and/or the repudiation of rele- 
vant information (for instance, the monotonicity of a response function in a given 
variable) that the method is not designed to incorporate. 

OUQ as an opening gambit. OUQ is not the definitive answer to the UQ prob- 
lem, but we hope that it will help to stimulate a discussion on the development of a 
rigorous and well-posed UQ framework analogous to that surrounding the development 
of probability theory. The reduction theorems of Section 4, the Optimal Concentration 
Inequalities and non-propagation of input uncertainties of Section 5, the possibility of 
the selection of optimal experiments described at the end of Section 2, and the numerical 
evidence of Section 6 that (singular, i.e. low-dimensional) optimizers are also attractors, 
suggest that such a discussion may lead to non-trivial and worthwhile questions and 
results at the interface of optimization theory, probability theory, computer science and 
statistics. 

In particular, many questions and issues raised by the OUQ formulation remain to 
be investigated. A few of those questions and issues are as follows: 

• Any (possibly numerical) method that finds admissible states (/, fj,) £ A leads to 
rigorous lower bounds on U{A). It is known that duality techniques lead to upper 
bounds on (/, fi) £ A provided that the associated Lagrangians can be computed. 
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Are there interesting classes of problems for which those Lagrangians can rigorously 
be estimated or bounded from above? 

• The reduction theorems of Section 4 are limited to linear constraints on probability 
distribution marginals and the introduction of sample data may lead to other 
situations of interest (for instance, relative-entropy type constraints). 

• Although general in its range of application, the algorithmic framework introduced 
in Section 6 is still lacking general convergence theorems. 

• The introduction of sample data appears to render the OUQ optimization problem 
even more complex. Can this optimization problem be made equivalent to applying 
the deterministic setting to an information set A randomized by the sample data? 

• In the presence of sample data, instead of doing theoretical analysis to describe 
the optimal statistical test, one formulation of the OUQ approach provides an op- 
timization problem that must be solved to determine the test. Is this optimization 
problem reducible? 
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10 Appendix: Proofs 
10.1 Proofs for Section 4 

Proof of Theorem 4-1- In this proof, we use (/ij, . . . , /z m ) as a synonym for the product 
Hi <S> ■ ■ ■ <8> Hm- For jjL = (^)™ 1 Hi € Ai G , consider the optimization problem 

maximize: E^,^,...,^) M, 
subject to: fj,'i S M(Xi), 

G(// x ,/X 2 , • • -,fJm) < 0- 

By Fubini's Theorem, 

E (Mi,W,..,Aw)M = E Mi [ E (M2,..,^ m )M] . 
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where ... 1(Uot )H is a Borel-measurable function on X\ and, for j = 1,... ,n, it holds 
that 

E^,^,..,^)^] =E Mi [% 2 ,..., Mm )[^]] , 
where IE( At2i ... iMm )fi ,/ j is a Borel-measurable function on A4. In the same way, we see that 

Vi^,...^)^]] =V 1 b)] ! 

and, for k = 2, . . . , m and j = 1, . . . , n^, it holds that 

^ >2 ,...^ m) [g^=E tlk [g k j 



which are constant in ji^. 

Since each X{ is Suslin, it follows that all the measures in M{Xi) are regular. Con- 
sequently, by [95, Theorem 11.1], the extreme set of Ai(Xi) is the set of Dirac masses. 
For fixed (^2, ■ ■ ■ , Mm), let Gi C M.(Xi) denote those measures that satisfy the con- 
straints G(/x^, Hi-, • • • , fJ. m ) < 0. Consequently, since for k = 2, . . . , m and j = 1, . . . , rik, 

|W is constant in fjf^, it follows from [100, Theorem 2.1] that the extreme 

set ex(Gi) C Ai(Xi) of the constraint set consists only of elements of A ni+n i(Xi). In 
addition, von Weizsacker and Winkler [99, Corollary 3] show that a Choquet theorem 
holds: let // satisfy the constraints. Then 



Jex(Gi) 



/ex(Gi) 

for all Borel sets B C Xi, where p is a probability measure on the extreme set ex(Gi). 

According to Winkler, an extended-real-valued function K on Gi is called measure 
affine if it satisfies the barycentric formula 



K(jjf)= [ K{u)dp{u). 

Jex(Gi) 



'ex(Gi) 

When K is measure affine, [100, Theorem 3.2] asserts that 

sup K(/j,') = sup K(v), 
/i'6Gi i/6cx(Gi) 

and so we conclude that 

sup K(fi') = sup K{v) < sup 
At'eGj yeex(Gi) i/eA Jll+n ,(Afi)nG 1 

However, since 

sup K{y) < sup K(u), 

^GA ni+n ,(A'i)nGi z^eGi 

it follows that 

sup K(fi') = sup K{v). 
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To apply this result, observe that [100, proposition 3.1] asserts that the evaluation 
function 

is measure affine. Therefore, 

SUP E( M >2,..,M m )M = SU P E (M' 1 ,M2 > ...,Mm)[ r ']- t 10 - 1 ) 

MieA ni+n ,(^) 

G(At' 1: ^2,-,Mm)<0 G(Ati,/i2,...,Atm)<0 

Now let e > and let //* G A ni+n /(Afi) be e-suboptimal for the right-hand side of (10.1): 
that is, G(fil, fi2, • • • j A*m) < 0, and 

%* )M2l ..., Mm )M > sup 2i ... iMm) [r] -e. 

Mi6A„ i+n ,(^) 

G(/4,M2>— ,Mm)<0 

Hence, by (10.1), 

E(^ )M2) ...^ m ) [r] > sup E {// ...^ m) [r] - e > E (miAt2v ..^ m) [r] - e. 

G(/i' 1 ,/i2v,^m)<0 

Consequently, the first component of \i by can be replaced some element of A ni+n /(Ai) to 
produce a feasible point // G A4 G without decreasing E[r] by more than e. By repeating 
this argument, it follows that for every point fi G M G there exists a // G M a such that 

E„,[r] > E M [r]-me. 

Since £ was arbitrary the result follows. □ 

Proof of Corollary 4-4- Simply use the identity 

U{A) = sup E^[ry]=sup sup E^[ry] 
(MeA ' feg^M m (x) 
G(/,m)<o 

and then apply Theorem 4.1 to the inner supremum. □ 
Proof of Theorem 4-7. Corollary 4.4 implies that U(A) = U(Aa) where, 



{m 
(/,M)Gax(g)A n (^) 
i=i 



E M [^o/] < for allj = 1, 



For each i = 1, . . . , m, the indexing of the Dirac masses pushes forward the measure // 
with weights a\, k = 0, . . . , n to a measure a 1 on M with weights a\, k = 0, . . . , n. Let 
a := ®£Li a * denote the corresponding product measure on V = M m . That is, we have 
a map 



A: (g)A n (*i) ^.M m (P) 



i=l 
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and the product map 



Fx A: g x (g)A n (^) x M m (V). 

8=1 

Since for any function g : R — >• R, we have F(g o /, //) = 50 F(/, it follows that for 
any (/, Jx ®™ 1 A n (^) that 

o /] = E Qfi [F(<? o /, M )] = E Qfi [ 5 o F(/, fx)] . 

Consequently, with the function 1Z V : J© x M m (V) ->■ R defined by 

K v (h,a) :=E a [roh], 

and for each j = 1, . . . , n, the functions Gj 5 : J-£> x A4 m (V) — >■ R defined by 

Gf(/i,a) o fc], 

we have that, for all (/,//) 6 Jx (g)™ 1 A n (^), 

n(f,/j)=n v (F(f,n),a li ), (10.2) 

and, for all j = 1, . . . , n and all (/, //) £ J 7 © x 0™ 1 A n (A£), 

G,(/, M ) = G?(F(/, (10.3) 

where := A(/i). 
That is, 

ft = K v o (F x A) , 

Gj = Gf o (F x A) for each j = 1, . . . , n. 

Consequently, any (/, (j,) £ Aa is mapped by F x A to a point in T x _A/f m (£>) that 
preserves the criterion value and the constraint, and by the assumption must lie in 
Qv x M m (V). This establishes U(Aa) < U{A V )- 

To obtain equality, consider (h,a) E Ax>- By assumption, there exists an (/,//) € 
G x x A n (^j) such that F(/, fi) = h. If we adjust the weights on \x so that A(/i) = a, 
we still maintain F(/, fj,) = h. By (10.2) and (10.3), this point has the same criterion 
value and satisfies the integral constraints of A a- The proof is finished. □ 

Proof of Proposition 4-8. Let I := l[ a ,oo) be the indicator function and consider rj := 
I o f so that ji[f > a] = E^[7 o /]. Since I o / is integrable for all fi G A4 m (X) and we 
have one constraint E„[/] < 0, the result follows from Theorem 4.7, provided that we 
have 



74 



To establish this, consider / G Q and observe that for all /j, G ®^=i ^i(^) it holds that 
Hf,p) G Qd- Therefore, we conclude that F (Q x ®™iA 1 (;t i )) C Q v . On the other 
hand, for any h G <5x>, we can choose a measurable product partition of <Y dividing each X{ 
into 2 cells. We pull back the function h to a function / G T that has the correct constant 
values in the partition cells, and place the Dirac masses into the correct cells. Set the 
weights to any nonzero values. It is easy to see that / G Q. Moreover, we have a measure 
fj, which satisfies F(/, fj) = h. Therefore, we conclude that F (Q x Ai(A^)) D Qv- 

This completes the proof. □ 

Proof of Theorem 4-9. First, observe that Qx> is a sub-lattice of Tx> in the usual lattice 
structure on functions. That is, if h\,h2 G Qv-, then it follows that both min(/ii, /12) G £/x> 
and max(/ii,/i2) G Therefore, for any admissible (h,a) G At>, it follows that 

clipping h at a to min(/i, a) produces an admissible (min(fa, a), a) and does not change 
the criterion value a[h > a}. Consequently, we can reduce to functions with maximum 
value a. Moreover, since each function h s is in the sub-lattice Q-p, it follows that h G 
Gv,C G C. For C G C, define the sub-lattice 

C v ■= {h G F v I {s I /i(s) = a} = C}} 

of functions with value a precisely on the set C . Now, consider a function h G £/x> 
such that h < a and let C be the set where h = a. It follows that 7i < h, h c G ^x> 5 
and fa G Cx>. Since h < h implies that E a [/i c ] < E a [/i] for all a, it follows that 
replacing (h,a) by (h c ,a) keeps it admissible, and a[h c > a] = a[h > a]. The proof is 
finished. □ 

10.2 Proofs for Section 5 

The proofs given in this subsection are direct applications of Theorem 4.9. In partic- 
ular, they are based on an analytical calculation of (4.19). Because Proposition 5.7 is 
fundamental to all the other results of the section, its proof will be given first. 

Proof of Proposition 5.7. When non-ambiguous, we will use the notation Ef/i^ ] for 
E a [h Co ] and ¥[h c ° > a] for a[h c ° > a}. First, observe that, if J2jLi D j ^ a > then 
mm(h Co ) > 0, and the constraint Ef/i^ ] < imply ¥[h Co = 0] = 1. This proves the first 
equation of (5.9). Now, assume a < Y^JLi Dj and observe that 

m 

h c ^{s) = a-Y J ^-s ] )D 3 . 
i=i 

It follows that 

m 

E a [h Co ] = a-J2( 1 - a j) D j- ( 10 - 4 ) 

i=i 

If D m = 0, then the optimum is achieved on boundary of [0, l] m (i.e. by taking a m = 1 
since Co = {(1,...,1)} and since h c ° does not depend on s<m,) ctncL the optimization 
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reduces to an (m — l)-dimensional problem. For that reason, we will assume in all of 
the proofs of the results given in this section that all the DjS are strictly positive. The 
statements of those results remain valid even if one or more of the DjS are equal to zero. 
The condition D m > implies that min(Z?i, . . . , D m ) > and that 

m 

a[h Co > a] = JJoy. (10.5) 

3=1 

If the optimum in a is achieved in the interior of the hypercube [0,l] m , then at that 
optimum the gradients of (10.4) and (10.5) are collinear. Hence, in that case, there 
exists A £ R such that for all £ G {1, ... , m}, 

11,-1 a i 

3 = XDi. (10.6) 

Since a[h c ° > a] is increasing in each cxj, the optimum is achieved at E a [/i c ' ] = 0. 
Substitution of (10.6) into the equation E Q [/i Co ] = yields that 



EJLl D 3 ~ a 

and, hence, 



a ' = "U ■ (io - 7) 

For all i € {1, ... , m}, the condition < «j < 1 is equivalent to a < Y%Li Dj and 

m 

^DjKa + mD,. (10.8) 
3=1 

It follows that for Y^jLi ®j ~ fnD m < a < Y^jLi Dj, the a defined by (10.7) lies in the 
interior of [0, l] m and satisfies 



a[h Co > a] - V 



m m UT=l D J 

If a < Sj=i — mD m , then the optimum is achieved on boundary of [0, l] m (i.e. by 
taking a m = 1, since Co = {(1, . . . , 1)}), and the optimization reduces to an (m — 1)- 
dimensional problem. 

To complete the proof, we use an induction. Observe in particular that, for k < m—1, 

(ZU D 3-*) k (ES^--«) fc+1 



k k uU D 3 "(^+i) fc+l nS^ 
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for a = Ylj=i Dj — (k + l)D k +i, and that 



(gjU Dj ~ «) fc ^ Dj ~ «) fc+1 



(10.9) 



is equivalent to a > Y^,j=i Dj — (k + l)Dfc + i. Indeed, writing a 
l)Dk + i + b, equation (10.9) is equivalent to 



b 



) 




6 



) 



fc+i 



1 - 



(k + l)D k+1 



kD k+1 



The function / given by f(x) := (l — ^) x is increasing in x (for < y < x): simply 
examine the derivative of log /, and use the elementary inequality 



We will now give the outline of the induction. It is trivial to obtain that equation 
(5.9) is true for m = 1. Assume that it is true for m = q — 1 and consider the case m = q. 
Equation (10.7) isolates the only potential optimizer a q , which is not on the boundary 
of [0, l] g (not (q — l)-dimensional). 

One obtains that equation (5.9) holds for m = q by comparing the value of a[h c ° > a] 
at locations a isolated by equations (10.7) and (10.8) with those isolated at step q — 1. 
This comparison is performed via equation (10.9). 

More precisely, if a q (the candidate for the optimizer in a isolated by the previous 
paragraph) is not an optimum, then the optimum must lie in the boundary of [0, l] q . 
Hence, the optimum must be achieved by taking on = 1 for some i E {1, . . . , q}. Observ- 
ing that U{Ac ) is increasing in each Di, and since D q = minj g { lj q y Di, that optimum 
can be achieved by taking i = q, which leads to computing U (Ac ) with {D\, . . . , 
where we can use the (q — l)-step of the induction. Using equation (10.9) for k = q—1, we 
obtain that a q is an optimum for a > Ylj=i Dj — qD q , and that, for a < Ylj=i Dj — qD q , 
the optimum is achieved by calculating U{Ac ) with q — 1 variables and (Dj, . . . , D q -{). 
This finishes the proof by using the induction assumption (see formula (5.9)). □ 

The following two lemmas illustrate simplifications that can be made using the sym- 
metries of the hypercube: 

Lemma 10.1. Let Co € C. If Co is symmetric with respect to the hyperplane containing 
the center of the hypercube and normal to the direction i, then the optimum of (5.8) can 
be achieved by taking a» = 1. 

Proof. The proof follows by observing that if Cq is symmetric with respect to the hyper- 
plane containing the center of the hypercube and normal to the direction i, then h °(s) 
does not depend on the variable Sj. □ 

The following lemma is trivial: 



log(l -z) + 



z 



> for < z < 1. 



l-z 
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Figure 10.1: For m = 2, the optimum associated with U(Ac) can be achieved with C = 
{(1, 1)}. For that specific value of C, the linearity of h c (s) = a — D\(l — s\) — Z^l — £2) 
implies U(Amd) = U(Amcd)- 



Lemma 10.2. Let (a,C) be an optimizer of (4.19). Then, the images of (a,C) by 
reflections with respect to hyperplanes containing the center of the hypercube and normal 
to its faces are also optimizers of (4.19). 

The proofs of the remaining theorems now follow in the order that the results were 
stated in the main part of the paper. 

Proof of Theorem 5.1. The calculation of U(Ac) for m = 1 is trivial and also follows 
from Proposition 5.7. □ 



Proof of Theorem 5.2. Write C\ = {(1,1)} (see Figure 10.1). Theorem 5.2 is a conse- 
quence of the following inequality: 

maxW(ico) < U(A Cl ) (10.10) 
C06C 

Assuming equation (10.10) to be true, equation (5.3) is obtained by calculating U{Ac l ) 
from Proposition 5.7 with m = 2. Let us now prove equation (10.10). Let Co G C; we 
need to prove that 

U(A Co ) <U(A Cl ). (10.11) 

By symmetry (using Lemma 10.2), it is no loss of generality to assume that (1, 1) G Cq. 
By Lemma 10.1 the optima for Co = {(1, 1), (1, 0)} and Co = {(1, 1), (0, 1)} can be 
achieved with C\ by taking ct\ = 1 or oli = 1. 

The case C = {(1,1); (1,0); (0, 1); (0,0)} is infeasible. 

For C = {(1,1), (1,0), (0,1)}, we have ¥[h Co = a] = /3 and E[h Co ] = a - (1 - 
/3) min(Z>i, D2) with j3 = 1 — (1 — a\)(l — 02) (recall that h c ° is defined by equation 
(4.17)). Hence, at the optimum (in a), 

r n , I 1 — 0/ min(Di , Do), if a < min(Di , Do), 
mh Co =a} = < 1 V ' ; ' V ' lh 10.12 

|0, if a > min(Z?i,D 2 ). 
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(a) Ci 



(b) C 2 



Figure 10.2: For m = 3, the optimum associated with U(Ac) can be achieved with 
d = {(1,1,1)} (leading to T x ) or C 2 = {(1, 1, 1), (0, 1, 1), (1, 0, 1), (1, 1, 0)} (leading 
to J r 2). The linearity of h Cl (s) = a — D\(l — s{) — D 2 (l — s 2 ) — ^3(1 — S3) implies 
that U(Amd) = M(AmcT>) when T\ > T%. Similarly, the nonlinearity of h° 2 leads to 
U(Amd) <K(Amcb) when F\ < T 2 . 



Equation (10.11) then holds by observing that one always has both 



mm(Di,D 2 ) 



< 1 



and 



< 



max(Di, D 2 ) 

(gi + D 2 - a) 2 
±D X D 2 



min(Di, D 2 

The last inequality is equivalent to [D% — D 2 + a) 2 > 0, which is always true. The 
case Co = {(1, 1), (0, 0)} is bounded by the previous one since F[h c ° = a] = /3 and 
K[h Co ] = a(3 - (1 - P) min(Di, D 2 ) with /3 = a x a 2 + (1 - t*i)(l - a 2 ). This finishes the 
proof. □ 



Proof of Theorem 5.4- Recall that 

U(AmcT>) = m&xU(Ac )- 

It follows from Proposition 5.7 that F\ corresponds to U(Ac 1 ) with C\ = {(1,1,1)}. 
Write C 2 = {(1, 1, 1), (0, 1, 1), (1, 0, 1), (1, 1, 0)} (see Figure 10.2). Let us now calculate 
U{Ac 2 ) (-^2 corresponds to l4(Ac 3 ), which is the optimum, and it is achieved in the 
interior of [0, l] 3 ). We have F[h C2 = a] = a 2 a^ + a\a^ + a\a 2 — 2a\a 2 a^, and 

E[h° 2 } = a- D 2 {1 - ai)(l - a 2 ) - D 3 ((1 - a 2 )(l - a 3 ) + (1 - ai)a 2 (l - a 3 )) . 
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An internal optimal point a must satisfy, for some A G R, 

a 2 + a 3 - 2a 2 a 3 = A (D 2 (l - a 2 ) + D 3 a 2 (l - "3)) , (10.13a) 

a\ + a 3 — 2ai«3 = A (^2(1 — Q!i) + D 3 a\(l — 03)) , (10.13b) 

a.% + «2 — 2ai«2 = A (1)3(1 — 0102)) • (10.13c) 



If we multiply the first equation by a\ and subtract the second equation multiplied by 
a 2 , the we obtain that 

(ai — a 2 )a 3 = \D 2 (ai — a 2 ), 

which implies that either ct\ = a 2 or a 3 = XD 2 . 

Suppose that at 7^ a 2 , so that a 3 = XD 2 . Subtraction of the second equation in 
(10.13) from the first yields 

(a 2 - ai)(l - 2a 3 ) = A {-D 2 (a 2 - a x ) + D 3 (a 2 - ai)(l - a 3 )) , 

which implies that either a\ = a 2 or 

1 - 2a 3 = A {-D 2 + D 3 (l - a 3 )) . 

Since 03 = XD 2 , this becomes 

a 3 D 3 

l-a 3 = (1 - a 3 ), 

which implies, since a 3 7^ 1, that a 3 = Therefore, A = -j^. Therefore, the third 
equation in (10.13) becomes 

a\ + a 2 — 2a\a 2 = A (-£>3(1 — ct\ct 2 )) = 1 — a\a 2 , 

which amounts to 

a\ + a 2 — a\a 2 = 1, 

which in turn amounts to a\(l — a 2 ) = 1 — a 2 . Since a 2 7^ 1, we conclude that at = 1, 
contradicting the supposition that a is an interior point. Therefore, a\ = a 2 and 
equations (10.13), with a := a% = a 2 , become 

a + a 3 - 2aa 3 =A {D 2 {1 - a) + D 3 a(l - a 3 )) (10.14a) 

2a -2a 2 = A (D 3 (l - a 2 )) . (10.14b) 

Hence, 

W[h c ' 2 = a] = 2aa 3 + a 2 - 2a 2 a 3 (10.15) 

and 

E[h° 2 } = a- D 2 (l - a) 2 - D 3 ((1 - a 2 ){\ - a 3 )) . 

The hypothesis that the optimum is not achieved on the boundary requires that 

D 3 ,0 < a < 1,D 2 + D 3 > a and E[h C2 } = 0. 
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The condition E[/i c ' 2 ] = is required because equation (10.15) is strictly increasing along 
the direction a = a 3 . 

Suppose that those conditions are satisfied. The condition Ef/i^ 2 ] = implies that 

a D 2 (l-a) 

1 - q 3 - 



D 3 (l -a 2 ) D 3 (l + a 
which in turns implies that 



a Do(l — a) , 

0-3 = 1 7tt + —, r. 10.16) 

D 3 (l -a 2 ) D 3 (l + a) y 1 



Substitution of (10.16) into (10.15) yields that P[/i° 2 = a] = $?(a), with 

T . . 9 . 9 . / a 1 Do 1 — a 

^(a) = a 2 + 2(a-a 2 )[l-—- 5-+ " 



D 3 (1 — a 2 ) D 3 l + a)' 
Hence, 

9 a a Do (1 — a) 2 
#(a) = 2a - a 2 - 2— + ^TT a 



D 3 1 + a D 3 1 + a 
fy(a) can be simplified using polynomial division. In particular, using 

a flzfO! = (1 _ o) ._(lz££ i 

1 + a 1 + a 

a- — = a 2 + 1 - 2a - (1 + a) + 4 



1 + a 1 + a 

where the last step is obtained from 

(1 - a ) 2 = (a + 1 - 2) 2 = (q + l) 2 - 4(1 + a) + 4, 



we obtain that 



¥(a) = 2a - a 2 - 2—- + 2—^ 4 + a 2 - 3a 



L> 3 1 + a L> 3 V 1 + " 

Therefore, 

o ( D 2 \ 2a a / D 2 \ £> 2 a 

, (a) = ( 2 _ - ,j - __ + 2a (l - 3-) + 8- — 



and 

»(«,) = a" ^ - Ij - 2 a (3^ - Ij + ^ ^ - . (10.17) 

Equation (10.17) implies that 

£> 3 *'(a) = 2a(2L> 2 - D 3 ) + 2(D 3 - 3D 2 ) - 1 (2a - 8D 2 ) 

(1 + a) z 
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The equation ^'(a) = is equivalent to equation (5.6). An interior optimum requires 
the existence of an a S (0, 1) such that VP' (a) = and a 3 G (0, 1), which leads to the 
definition of T%. This establishes the theorem for the T 2 case. 

Next, using symmetries of the hypercube and through direct computation (as in the 
m = 2 case), we obtain that 

C / C 2 =>- U(Ac ) < U(A Cl ). (10.18) 

For the sake of concision, we will give the detailed proof of (10.18) only for 

C 3 = {(1,1,1), (0,1,1), (1,0,1)}. 

This proof will give an illustration of generic reduction properties used in other cases. To 
address all the symmetric transformations of C 3 , we will give the proof without assuming 
that Di,D2 and D 3 are ordered. Let us now consider the C 3 scenario. If the optimum in 
a is achieved on the boundary of [0, l] 3 , then equation (10.10) implies U(Ac 3 ) < U{Ac 1 )- 
Let us assume that the optimum is not achieved on the boundary of [0, l] 3 . Observe 
that 

h° 3 (si, S 2 , 0) = h c *( Sl ,s 2 A)-D 3 . (10.19) 

Combining (10.19) with 

E[h° 3 } = a 3 E[h C3 ( Sl ,s 2 ,l)} + (1 - a 3 )E[h C3 ( Sl ,s 2 ,0)} 

implies that 

E[h c *} = E[h c *( Sl , 8 2 , 1)] - (1 - a 3 )Z> 3 . 

Furthermore, 

F[h° 3 = a}= a 3 F{h C3 ( Sl ,s 2 ,l) = a]. (10.20) 

Maximizing (10.20) with respect to a 3 under the constraint Ef/i*^ 3 ] < leads to E[/i c ' 3 ] = 
(because P[/i C3 = a] and E[h C3 } are linear in a 3 ) and 

a, = 1 - E[ " C " ( '" S2 - 1)I . (10.21) 
^3 

Observe that the condition a 3 < 1 requires E[h Cz (si, s 2 , 1)] > 0. If E[h C3 (si, s 2 , 1)] < 
then 03 = 1, and the optimum is achieved on the boundary of [0, l] 3 . 

The maximization of F[h C:i (s\, s 2 , 1) = a] under the constraint E[h Cs (s\, s 2 , 1)] < E 
(where E is a slack optimization variable) leads to (using the m = 2 result) 

Flh c *( Sl ,s 2 A) = a] = l Ul ' t:) 



min(-Di, D 2 



iia-E < mm(D 1 ,D 2 ), and F[h C3 {s 1 ,s 2l 1) = a] = otherwise. It follows from (10.21) 
and (10.20) that if the optimum is achieved at an interior point, then the optimal value 
of Pf/i^ 3 = a] is achieved by taking the maximum of 



»[/i C3 



E \ ( a-E 



D 3 J V min(Di,£> 2 ) 
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with respect to E with the constraints < E < D3 and a — min(Z)i , D 2 ) < E. If the 
optimum is not achieved on the boundary of [0, l] 3 then one must have 

D 3 + mm{D 1 ,D 2 )-a 
& = , 



which leads to 

, r , (D 3 + min(Di, J D 2 ) -a) 2 
P /i c 3 = a = v /' 2 \ ; . 10.22 

1 J 4£> 3 min(£>i,L>2) ^ ' 

Comparison of (10.22) and (5.3) implies that U(Ac 3 ) < U(Ad), by Proposition 5.7. □ 

Proof of Proposition 5. 9. The idea of the proof is to show that h can be chosen so that 
C contains only one vertex of the hypercube, in which case we have the explicit formula 
obtained in Proposition 5.7. 

First, observe that if a > ^j^i 1 then it is not possible to satisfy the constraint 
1E q [/i c ] < whenever C contains two or more vertices of the hypercube. Next, if C 
contains two vertices s , s 2 of the hypercube, and the Hamming distance between those 
points is 1, then C is symmetric with respect to a hyperplane containing the center of the 
hypercube and normal to one of its faces, and the problem reduces to dimension m— 1. It 
follows from Lemma 10.1 that the optimum of (5.8) can be achieved by a C that has only 
one element. If C contains two vertices of the hypercube, and the Hamming distance 
between those points is greater than or equal to 2, then the constraint Eaf/i*^] < is 
infeasible if a > Y^j=\ Dj + D m (because hP > in that case). Therefore, we conclude 
using Proposition 5.7. □ 

Proof of Theorem 5.11. First, we observe that we always have 

W(^Hfd) < ^GAmcd). (10-23) 

We observe from equation (10.10) that the maximizer (h ) of U(Amcd) is linear (see 
Figure 10.1), and hence is also an admissible function under U^Amd). This finishes the 
proof. □ 

Proof of Theorem 5.13. Just as for m = 2, equation (10.23) is always satisfied. Next, 
observing that J-\, in Theorem 5.4, is associated with a linear maximizer h (see Figure 
10.2), we deduce that 

T\ < U(Amd) < max(Ji, 

This finishes the proof for equation (5.12). Let us now prove equation (5.13). Assuming 
that U(Amd) = ^(-4mcd)> it follows that U{Amcd) can be achieved by a linear function 
h$. Since at the optimum we must have E[/io] = 0, and since min(/io,a) is also a 
maximizer of IA{AmcT>)i it follows that min(/io,a) = h^. Using the linearity of /io, and 
the lattice structure of the set of functions in Z^(^Imcd), we deduce that ho = h c , where 
C contains only one vertex of the cube. It follows that T\ > T 2 . This finishes the 
proof. □ 
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